package waymo.open_dataset

Get desktop application:
View/edit binary Protocol Buffers messages

A segment of a lane with a given adjacent boundary.

Used in: LaneCenter, LaneNeighbor

optional int32 lane_start_index = 1
The index into the lane's polyline where this lane boundary starts.
optional int32 lane_end_index = 2
The index into the lane's polyline where this lane boundary ends.
optional int64 boundary_feature_id = 3
The adjacent boundary feature ID of the MapFeature for the boundary. This can either be a RoadLine feature or a RoadEdge feature.
optional RoadLine.RoadLineType boundary_type = 4
The adjacent boundary type. If the boundary is a road edge instead of a road line, this will be set to TYPE_UNKNOWN.

Used in: Context

optional CameraName.Name name = 1
repeated double intrinsic = 2
1d Array of [f_u, f_v, c_u, c_v, k{1, 2}, p{1, 2}, k{3}]. Note that this intrinsic corresponds to the images after scaling. Camera model: pinhole camera. Lens distortion: Radial distortion coefficients: k1, k2, k3. Tangential distortion coefficients: p1, p2. k_{1, 2, 3}, p_{1, 2} follows the same definition as OpenCV. https://en.wikipedia.org/wiki/Distortion_(optics) https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
optional Transform extrinsic = 3
Camera frame to vehicle frame.
optional int32 width = 4
Camera image size.
optional int32 height = 5
optional CameraCalibration.RollingShutterReadOutDirection rolling_shutter_direction = 6

Used in: CameraCalibration

UNKNOWN = 0
TOP_TO_BOTTOM = 1
LEFT_TO_RIGHT = 2
BOTTOM_TO_TOP = 3
RIGHT_TO_LEFT = 4
GLOBAL_SHUTTER = 5

All timestamps in this proto are represented as seconds since Unix epoch.

Used in: Frame

optional CameraName.Name name = 1
optional bytes image = 2
JPEG image.
optional Transform pose = 3
SDC pose.
optional Velocity velocity = 4
SDC velocity at 'pose_timestamp' below. The velocity value is represented at *global* frame. With this velocity, the pose can be extrapolated. r(t+dt) = r(t) + dr/dt * dt where dr/dt = v_{x,y,z}. dR(t)/dt = W*R(t) where W = SkewSymmetric(w_{x,y,z}) This differential equation solves to: R(t) = exp(Wt)*R(0) if W is constant. When dt is small: R(t+dt) = (I+W*dt)R(t) r(t) = (x(t), y(t), z(t)) is vehicle location at t in the global frame. R(t) = Rotation Matrix (3x3) from the body frame to the global frame at t. SkewSymmetric(x,y,z) is defined as the cross-product matrix in the following: https://en.wikipedia.org/wiki/Cross_product#Conversion_to_matrix_multiplication
optional double pose_timestamp = 5
Timestamp of the `pose` above.
optional double shutter = 6
Rolling shutter params. The following explanation assumes left->right rolling shutter. Rolling shutter cameras expose and read the image column by column, offset by the read out time for each column. The desired timestamp for each column is the middle of the exposure of that column as outlined below for an image with 3 columns: ------time------> |---- exposure col 1----| read | -------|---- exposure col 2----| read | --------------|---- exposure col 3----| read | ^trigger time ^readout end time ^time for row 1 (= middle of exposure of row 1) ^time image center (= middle of exposure of middle row) Shutter duration in seconds. Exposure time per column.
optional double camera_trigger_time = 7
Time when the sensor was triggered and when last readout finished. The difference between trigger time and readout done time includes the exposure time and the actual sensor readout time.
optional double camera_readout_done_time = 8
optional CameraSegmentationLabel camera_segmentation_label = 10
Panoptic segmentation labels for this camera image. NOTE: Not every image has panoptic segmentation labels.

The camera labels associated with a given camera image. This message indicates the ground truth information for the camera image recorded by the given camera. If there are no labeled objects in the image, then the labels field is empty.

Used in: Frame

optional CameraName.Name name = 1
repeated Label labels = 2

(message has no fields)

Used in: CameraCalibration, CameraImage, CameraLabels

UNKNOWN = 0
FRONT = 1
FRONT_LEFT = 2
FRONT_RIGHT = 3
SIDE_LEFT = 4
SIDE_RIGHT = 5

Panoptic (instance + semantic) segmentation labels for a given camera image. Associations can also be provided between each instance ID and a globally unique ID across all frames.

Used in: CameraImage

optional int32 panoptic_label_divisor = 1
The value used to separate instance_ids from different semantic classes. See the panoptic_label field for how this is used. Must be set to be greater than the maximum instance_id.
optional bytes panoptic_label = 2
A uint16 png encoded image, with the same resolution as the corresponding camera image. Each pixel contains a panoptic segmentation label, which is computed as: semantic_class_id * panoptic_label_divisor + instance_id. We set instance_id = 0 for pixels for which there is no instance_id. NOTE: Instance IDs in this label are only consistent within this camera image. Use instance_id_to_global_id_mapping to get cross-camera consistent instance IDs.
repeated CameraSegmentationLabel.InstanceIDToGlobalIDMapping instance_id_to_global_id_mapping = 3
optional string sequence_id = 4
The sequence id for this label. The above instance_id_to_global_id_mapping is only valid with other labels with the same sequence id.
optional bytes num_cameras_covered = 5
A uint8 png encoded image, with the same resolution as the corresponding camera image. The value on each pixel indicates the number of cameras that overlap with this pixel. Used for the weighted Segmentation and Tracking Quality (wSTQ) metric.

A mapping between each panoptic label with an instance_id and a globally unique ID across all frames within the same sequence. This can be used to match instances across cameras and over time. i.e. instances belonging to the same object will map to the same global ID across all frames in the same sequence. NOTE: These unique IDs are not consistent with other IDs in the dataset, e.g. the bounding box IDs.

Used in: CameraSegmentationLabel

optional int32 local_instance_id = 1
optional int32 global_instance_id = 2
optional bool is_tracked = 3
If false, the corresponding instance will not have consistent global ids between frames.

Lidar data of a frame.

Used in: Scenario

repeated CompressedLaser lasers = 1
The Lidar data for each timestamp.
repeated LaserCalibration laser_calibrations = 2
Laser calibration data has the same length as that of lasers.
optional Transform pose = 3
Poses of the SDC corresponding to the track states for each step in the scenario, similar to the one in the Frame proto.

Compressed Laser data.

Used in: CompressedFrameLaserData

optional LaserName.Name name = 1
optional CompressedRangeImage ri_return1 = 2
optional CompressedRangeImage ri_return2 = 3

Range image is a 2d tensor. The first dimension (rows) represents pitch. The second dimension represents yaw (columns). Zlib compressed range images include: Raw range image: Raw range image with a non-empty 'range_image_pose_delta_compressed' which tells the vehicle pose of each range image cell. NOTE: 'range_image_pose_delta_compressed' is only populated for the first range image return. The second return has the exact the same range image pose as the first one.

Used in: CompressedLaser

optional bytes range_image_delta_compressed = 1
Zlib compressed [H, W, 4] serialized DeltaEncodedData message version which stores MatrixFloat. MatrixFloat range_image; range_image.ParseFromString(val); Inner dimensions are: * channel 0: range * channel 1: intensity * channel 2: elongation * channel 3: is in any no label zone.
optional bytes range_image_pose_delta_compressed = 4
Zlib compressed [H, W, 4] serialized DeltaEncodedData message version which stores MatrixFloat. To decompress (Please see the documentation for lidar delta encoding): string val = delta_encoder.decompress(range_image_pose_compressed); MatrixFloat range_image_pose; range_image_pose.ParseFromString(val); Inner dimensions are [roll, pitch, yaw, x, y, z] represents a transform from vehicle frame to global frame for every range image pixel. This is ONLY populated for the first return. The second return is assumed to have exactly the same range_image_pose_compressed. The roll, pitch and yaw are specified as 3-2-1 Euler angle rotations, meaning that rotating from the navigation to vehicle frame consists of a yaw, then pitch and finally roll rotation about the z, y and x axes respectively. All rotations use the right hand rule and are positive in the counter clockwise direction.

Used in: Frame

optional string name = 1
A unique name that identifies the frame sequence.
repeated CameraCalibration camera_calibrations = 2
repeated LaserCalibration laser_calibrations = 3
optional Context.Stats stats = 4

Some stats for the run segment used.

Used in: Context

repeated Stats.ObjectCount laser_object_counts = 1
repeated Stats.ObjectCount camera_object_counts = 5
optional string time_of_day = 2
Day, Dawn/Dusk, or Night, determined from sun elevation.
optional string location = 3
Human readable location (e.g. CHD, SF) of the run segment.
optional string weather = 4
Currently either Sunny or Rain.

Used in: Stats

optional Label.Type type = 1
optional int32 count = 2
The number of unique objects with the type in the segment.

Used in: MapFeature

repeated MapPoint polygon = 1
The polygon defining the outline of the crosswalk. The polygon is assumed to be closed (i.e. a segment exists between the last point and the first point).

Delta Encoded data structure. The protobuf compressed mask and residual data and the compressed data is encoded via zlib: compressed_bytes = zlib.compress( metadata + data_bytes + mask_bytes + residuals_bytes) The range_image_delta_compressed and range_image_pose_delta_compressed in the CompressedRangeImage are both encoded using this method.

repeated sint64 residual = 1
repeated uint32 mask = 2
optional Metadata metadata = 3

Used in: MapFeature

repeated MapPoint polygon = 1
The polygon defining the outline of the driveway region. The polygon is assumed to be closed (i.e. a segment exists between the last point and the first point).

The dynamic map information at a single time step.

Used in: Scenario

repeated TrafficSignalLaneState lane_states = 1
The traffic signal states for all observed signals at this time step.

Used in: Map

optional double timestamp_seconds = 1
The timestamp associated with the dynamic feature data.
repeated TrafficSignalLaneState lane_states = 2
The set of traffic signal states for the associated time step.

optional Context context = 1
This context is the same for all frames belong to the same driving run segment. Use context.name to identify frames belong to the same driving segment. We do not store all frames from one driving segment in one proto to avoid huge protos.
optional int64 timestamp_micros = 2
Frame start time, which is the timestamp of the first top LiDAR scan within this frame. Note that this timestamp does not correspond to the provided vehicle pose (pose).
optional Transform pose = 3
Frame vehicle pose. Note that unlike in CameraImage, the Frame pose does not correspond to the provided timestamp (timestamp_micros). Instead, it roughly (but not exactly) corresponds to the vehicle pose in the middle of the given frame. The frame vehicle pose defines the coordinate system which the 3D laser labels are defined in.
repeated CameraImage images = 4
The camera images.
repeated Laser lasers = 5
The LiDAR sensor data.
repeated Label laser_labels = 6
Native 3D labels that correspond to the LiDAR sensor data. The 3D labels are defined w.r.t. the frame vehicle pose coordinate system (pose).
repeated CameraLabels projected_lidar_labels = 9
The native 3D LiDAR labels (laser_labels) projected to camera images. A projected label is the smallest image axis aligned rectangle that can cover all projected points from the 3d LiDAR label. The projected label is ignored if the projection is fully outside a camera image. The projected label is clamped to the camera image if it is partially outside.
repeated CameraLabels camera_labels = 8
Native 2D camera labels. Note that if a camera identified by CameraLabels.name has an entry in this field, then it has been labeled, even though it is possible that there are no labeled objects in the corresponding image, which is identified by a zero sized CameraLabels.labels.
repeated Polygon2dProto no_label_zones = 7
No label zones in the *global* frame.
repeated MapFeature map_features = 10
Map features. Only the first frame in a segment will contain map data. This field will be empty for other frames as the map is identical for all frames.
optional Vector3d map_pose_offset = 11
Map pose offset. This offset must be added to lidar points from this frame to compensate for pose drift and align with the map features.

Used in: CameraLabels, Frame

optional Label.Box box = 1
optional Label.Metadata metadata = 2
optional Label.Type type = 3
optional string id = 4
Object ID.
optional Label.DifficultyLevel detection_difficulty_level = 5
Difficulty level for detection problem.
optional Label.DifficultyLevel tracking_difficulty_level = 6
Difficulty level for tracking problem.
optional int32 num_lidar_points_in_box = 7
The total number of lidar points in this box.
optional int32 num_top_lidar_points_in_box = 13
The total number of top lidar points in this box.
oneof keypoints_oneof
- keypoints.LaserKeypoints laser_keypoints = 8
  Used if the Label is a part of `Frame.laser_labels`.
- keypoints.CameraKeypoints camera_keypoints = 9
  Used if the Label is a part of `Frame.camera_labels`.
optional Label.Association association = 10
optional string most_visible_camera_name = 11
Used by Lidar labels to store in which camera it is mostly visible.
optional Label.Box camera_synced_box = 12
Used by Lidar labels to store a camera-synchronized box corresponding to the camera indicated by `most_visible_camera_name`. Currently, the boxes are shifted to the time when the most visible camera captures the center of the box, taking into account the rolling shutter of that camera. Specifically, given the object box living at the start of the Open Dataset frame (t_frame) with center position (c) and velocity (v), we aim to find the camera capture time (t_capture), when the camera indicated by `most_visible_camera_name` captures the center of the object. To this end, we solve the rolling shutter optimization considering both ego and object motion: t_capture = image_column_to_time( camera_projection(c + v * (t_capture - t_frame), transform_vehicle(t_capture - t_ref), cam_params)), where transform_vehicle(t_capture - t_frame) is the vehicle transform from a pose reference time t_ref to t_capture considering the ego motion, and cam_params is the camera extrinsic and intrinsic parameters. We then move the label box to t_capture by updating the center of the box as follows: c_camra_synced = c + v * (t_capture - t_frame), while keeping the box dimensions and heading direction. We use the camera_synced_box as the ground truth box for the 3D Camera-Only Detection Challenge. This makes the assumption that the users provide the detection at the same time as the most visible camera captures the object center.

Information to cross reference between labels for different modalities.

Used in: Label

optional string laser_object_id = 1
Currently only CameraLabels with class `TYPE_PEDESTRIAN` store information about associated lidar objects.

Upright box, zero pitch and roll.

Used in: Label

optional double center_x = 1
Box coordinates in vehicle frame.
optional double center_y = 2
optional double center_z = 3
optional double length = 5
Dimensions of the box. length: dim x. width: dim y. height: dim z.
optional double width = 4
optional double height = 6
optional double heading = 7
The heading of the bounding box (in radians). The heading is the angle required to rotate +x to the surface normal of the box front face. It is normalized to [-pi, pi).

TYPE_UNKNOWN = 0
TYPE_3D = 1
7-DOF 3D (a.k.a upright 3D box).
TYPE_2D = 2
5-DOF 2D. Mostly used for laser top down representation.
TYPE_AA_2D = 3
Axis aligned 2D. Mostly used for image.

The difficulty level of this label. The higher the level, the harder it is.

Used in: Label

UNKNOWN = 0
LEVEL_1 = 1
LEVEL_2 = 2

Used in: Label

optional double speed_x = 1
optional double speed_y = 2
optional double speed_z = 5
optional double accel_x = 3
optional double accel_y = 4
optional double accel_z = 6

Used in: Context.Stats.ObjectCount, Label

TYPE_UNKNOWN = 0
TYPE_VEHICLE = 1
TYPE_PEDESTRIAN = 2
TYPE_SIGN = 3
TYPE_CYCLIST = 4

Used in: MapFeature

optional double speed_limit_mph = 1
The speed limit for this lane.
optional LaneCenter.LaneType type = 2
optional bool interpolating = 3
True if the lane interpolates between two other lanes.
repeated MapPoint polyline = 8
The polyline data for the lane. A polyline is a list of points with segments defined between consecutive points.
repeated int64 entry_lanes = 9
A list of IDs for lanes that this lane may be entered from.
repeated int64 exit_lanes = 10
A list of IDs for lanes that this lane may exit to.
repeated BoundarySegment left_boundaries = 13
The boundaries to the left of this lane. There may be different boundary types along this lane. Each BoundarySegment defines a section of the lane with a given boundary feature to the left. Note that some lanes do not have any boundaries (i.e. lane centers in intersections).
repeated BoundarySegment right_boundaries = 14
The boundaries to the right of this lane. See left_boundaries for details.
repeated LaneNeighbor left_neighbors = 11
A list of neighbors to the left of this lane. Neighbor lanes include only adjacent lanes going the same direction.
repeated LaneNeighbor right_neighbors = 12
A list of neighbors to the right of this lane. Neighbor lanes include only adjacent lanes going the same direction.

Type of this lane.

Used in: LaneCenter

TYPE_UNDEFINED = 0
TYPE_FREEWAY = 1
TYPE_SURFACE_STREET = 2
TYPE_BIKE_LANE = 3

Used in: LaneCenter

optional int64 feature_id = 1
The feature ID of the neighbor lane.
optional int32 self_start_index = 2
The self adjacency segment. The other lane may only be a neighbor for only part of this lane. These indices define the points within this lane's polyline for which feature_id is a neighbor. If the lanes are neighbors at disjoint places (e.g., a median between them appears and then goes away) multiple neighbors will be listed. A lane change can only happen from this segment of this lane into the segment of the neighbor lane defined by neighbor_start_index and neighbor_end_index.
optional int32 self_end_index = 3
optional int32 neighbor_start_index = 4
The neighbor adjacency segment. These indices define the valid portion of the neighbor lane's polyline where that lane is a neighbor to this lane. A lane change can only happen into this segment of the neighbor lane from the segment of this lane defined by self_start_index and self_end_index.
optional int32 neighbor_end_index = 5
repeated BoundarySegment boundaries = 6
A list of segments within the self adjacency segment that have different boundaries between this lane and the neighbor lane. Each entry in this field contains the boundary type between this lane and the neighbor lane along with the indices into this lane's polyline where the boundary type begins and ends.

Used in: Frame

optional LaserName.Name name = 1
optional RangeImage ri_return1 = 2
optional RangeImage ri_return2 = 3

Used in: CompressedFrameLaserData, Context

optional LaserName.Name name = 1
repeated double beam_inclinations = 2
If non-empty, the beam pitch (in radians) is non-uniform. When constructing a range image, this mapping is used to map from beam pitch to range image row. If this is empty, we assume a uniform distribution.
optional double beam_inclination_min = 3
beam_inclination_{min,max} (in radians) are used to determine the mapping.
optional double beam_inclination_max = 4
optional Transform extrinsic = 5
Lidar frame to vehicle frame.

'Laser' is used interchangeably with 'Lidar' in this file.

(message has no fields)

Used in: CompressedLaser, Laser, LaserCalibration

UNKNOWN = 0
TOP = 1
FRONT = 2
SIDE_LEFT = 3
SIDE_RIGHT = 4
REAR = 5

repeated MapFeature map_features = 1
The full set of map features.
repeated DynamicState dynamic_states = 2
A set of dynamic states per time step. These are ordered in consecutive time steps.

Used in: Frame, Map, Scenario

optional int64 id = 1
A unique ID to identify this feature.
oneof feature_data
Type specific data.
- LaneCenter lane = 3
- RoadLine road_line = 4
- RoadEdge road_edge = 5
- StopSign stop_sign = 7
- Crosswalk crosswalk = 8
- SpeedBump speed_bump = 9
- Driveway driveway = 10

Used in: Crosswalk, Driveway, LaneCenter, RoadEdge, RoadLine, SpeedBump, StopSign, TrafficSignalLaneState

optional double x = 1
Position in meters. The origin is an arbitrary location.
optional double y = 2
optional double z = 3

Row-major matrix. Requires: data.size() = product(shape.dims()).

Used in: RangeImage

repeated float data = 1
optional MatrixShape shape = 2

Row-major matrix. Requires: data.size() = product(shape.dims()).

repeated int32 data = 1
optional MatrixShape shape = 2

Used in: MatrixFloat, MatrixInt32

repeated int32 dims = 1
Dimensions for the Matrix messages defined below. Must not be empty. The order of entries in 'dims' matters, as it indicates the layout of the values in the tensor in-memory representation. The first entry in 'dims' is the outermost dimension used to lay out the values; the last entry is the innermost dimension. This matches the in-memory layout of row-major matrices.

Metadata used for delta encoder.

Used in: DeltaEncodedData

repeated int32 shape = 1
Range image's shape information in the compressed data.
repeated float quant_precision = 2
Range image quantization precision for each range image channel.

Used in: Track

optional double center_x = 2
Coordinates of the center of the object bounding box.
optional double center_y = 3
optional double center_z = 4
optional float length = 5
The dimensions of the bounding box in meters.
optional float width = 6
optional float height = 7
optional float heading = 8
The yaw angle in radians of the forward direction of the bounding box (the vector from the center of the box to the middle of the front box segment) counter clockwise from the X-axis (right hand system about the Z axis). This angle is normalized to [-pi, pi).
optional float velocity_x = 9
The velocity vector in m/s. This vector direction may be slightly different from the heading of the bounding box.
optional float velocity_y = 10
optional bool valid = 11
False if the state data is invalid or missing.

Non-self-intersecting 2d polygons. This polygon is not necessarily convex.

Used in: Frame

repeated double x = 1
repeated double y = 2
optional string id = 3
A globally unique ID.

Range image is a 2d tensor. The first dim (row) represents pitch. The second dim represents yaw. There are two types of range images: 1. Raw range image: Raw range image with a non-empty 'range_image_pose_compressed' which tells the vehicle pose of each range image cell. 2. Virtual range image: Range image with an empty 'range_image_pose_compressed'. This range image is constructed by transforming all lidar points into a fixed vehicle frame (usually the vehicle frame of the middle scan). NOTE: 'range_image_pose_compressed' is only populated for the first range image return. The second return has the exact the same range image pose as the first one.

Used in: Laser

optional bytes range_image_compressed = 2
Zlib compressed [H, W, 4] serialized version of MatrixFloat. To decompress: string val = ZlibDecompress(range_image_compressed); MatrixFloat range_image; range_image.ParseFromString(val); Inner dimensions are: * channel 0: range * channel 1: intensity * channel 2: elongation * channel 3: is in any no label zone.
optional bytes camera_projection_compressed = 3
Lidar point to camera image projections. A point can be projected to multiple camera images. We pick the first two at the following order: [FRONT, FRONT_LEFT, FRONT_RIGHT, SIDE_LEFT, SIDE_RIGHT]. Zlib compressed [H, W, 6] serialized version of MatrixInt32. To decompress: string val = ZlibDecompress(camera_projection_compressed); MatrixInt32 camera_projection; camera_projection.ParseFromString(val); Inner dimensions are: * channel 0: CameraName.Name of 1st projection. Set to UNKNOWN if no projection. * channel 1: x (axis along image width) * channel 2: y (axis along image height) * channel 3: CameraName.Name of 2nd projection. Set to UNKNOWN if no projection. * channel 4: x (axis along image width) * channel 5: y (axis along image height) Note: pixel 0 corresponds to the left edge of the first pixel in the image.
optional bytes range_image_pose_compressed = 4
Zlib compressed [H, W, 6] serialized version of MatrixFloat. To decompress: string val = ZlibDecompress(range_image_pose_compressed); MatrixFloat range_image_pose; range_image_pose.ParseFromString(val); Inner dimensions are [roll, pitch, yaw, x, y, z] represents a transform from vehicle frame to global frame for every range image pixel. This is ONLY populated for the first return. The second return is assumed to have exactly the same range_image_pose_compressed. The roll, pitch and yaw are specified as 3-2-1 Euler angle rotations, meaning that rotating from the navigation to vehicle frame consists of a yaw, then pitch and finally roll rotation about the z, y and x axes respectively. All rotations use the right hand rule and are positive in the counter clockwise direction.
optional bytes range_image_flow_compressed = 5
Zlib compressed [H, W, 5] serialized version of MatrixFloat. To decompress: string val = ZlibDecompress(range_image_flow_compressed); MatrixFloat range_image_flow; range_image_flow.ParseFromString(val); Inner dimensions are [vx, vy, vz, pointwise class]. If the point is not annotated with scene flow information, class is set to -1. A point is not annotated if it is in a no-label zone or if its label bounding box does not have a corresponding match in the previous frame, making it infeasible to estimate the motion of the point. Otherwise, (vx, vy, vz) are velocity along (x, y, z)-axis for this point and class is set to one of the following values: -1: no-flow-label, the point has no flow information. 0: unlabeled or "background,", i.e., the point is not contained in a bounding box. 1: vehicle, i.e., the point corresponds to a vehicle label box. 2: pedestrian, i.e., the point corresponds to a pedestrian label box. 3: sign, i.e., the point corresponds to a sign label box. 4: cyclist, i.e., the point corresponds to a cyclist label box.
optional bytes segmentation_label_compressed = 6
Zlib compressed [H, W, 2] serialized version of MatrixInt32. To decompress: string val = ZlibDecompress(segmentation_label_compressed); MatrixInt32 segmentation_label. segmentation_label.ParseFromString(val); Inner dimensions are [instance_id, semantic_class]. NOTE: 1. Only TOP LiDAR has segmentation labels. 2. Not every frame has segmentation labels. This field is not set if a frame is not labeled. 3. There can be points missing segmentation labels within a labeled frame. Their label are set to TYPE_NOT_LABELED when that happens.
optional MatrixFloat range_image = 1
Deprecated, do not use.

An object that must be predicted for the scenario.

Used in: Scenario

optional int32 track_index = 1
An index into the Scenario `tracks` field for the object to be predicted.
optional RequiredPrediction.DifficultyLevel difficulty = 2
The difficulty level for this object.

A difficulty level for predicting a given track.

Used in: RequiredPrediction

NONE = 0
LEVEL_1 = 1
LEVEL_2 = 2

Used in: MapFeature

optional RoadEdge.RoadEdgeType type = 1
The type of road edge.
repeated MapPoint polyline = 2
The polyline defining the road edge. A polyline is a list of points with segments defined between consecutive points.

Type of this road edge.

Used in: RoadEdge

TYPE_UNKNOWN = 0
TYPE_ROAD_EDGE_BOUNDARY = 1
Physical road boundary that doesn't have traffic on the other side (e.g., a curb or the k-rail on the right side of a freeway).
TYPE_ROAD_EDGE_MEDIAN = 2
Physical road boundary that separates the car from other traffic (e.g. a k-rail or an island).

Used in: MapFeature

optional RoadLine.RoadLineType type = 1
The type of the lane boundary.
repeated MapPoint polyline = 2
The polyline defining the road edge. A polyline is a list of points with segments defined between consecutive points.

Type of this road line.

Used in: BoundarySegment, RoadLine

TYPE_UNKNOWN = 0
TYPE_BROKEN_SINGLE_WHITE = 1
TYPE_SOLID_SINGLE_WHITE = 2
TYPE_SOLID_DOUBLE_WHITE = 3
TYPE_BROKEN_SINGLE_YELLOW = 4
TYPE_BROKEN_DOUBLE_YELLOW = 5
TYPE_SOLID_SINGLE_YELLOW = 6
TYPE_SOLID_DOUBLE_YELLOW = 7
TYPE_PASSING_DOUBLE_YELLOW = 8

optional string scenario_id = 5
The unique ID for this scenario.
repeated double timestamps_seconds = 1
Timestamps corresponding to the track states for each step in the scenario. The length of this field is equal to tracks[i].states_size() for all tracks i and equal to the length of the dynamic_map_states_field.
optional int32 current_time_index = 10
The index into timestamps_seconds for the current time. All time steps after this index are future data to be predicted. All steps before this index are history data.
repeated Track tracks = 2
Tracks for all objects in the scenario. All object tracks in all scenarios in the dataset have the same number of object states. In this way, the tracks field forms a 2 dimensional grid with objects on one axis and time on the other. Each state can be associated with a timestamp in the 'timestamps_seconds' field by its index. E.g., tracks[i].states[j] indexes the i^th agent's state at time timestamps_seconds[j].
repeated DynamicMapState dynamic_map_states = 7
The dynamic map states in the scenario (e.g. traffic signal states). This field has the same length as timestamps_seconds. Each entry in this field can be associated with a timestamp in the 'timestamps_seconds' field by its index. E.g., dynamic_map_states[i] indexes the dynamic map state at time timestamps_seconds[i].
repeated MapFeature map_features = 8
The set of static map features for the scenario.
optional int32 sdc_track_index = 6
The index into the tracks field of the autonomous vehicle object.
repeated int32 objects_of_interest = 4
A list of objects IDs in the scene detected to have interactive behavior. The objects in this list form an interactive group. These IDs correspond to IDs in the tracks field above.
repeated RequiredPrediction tracks_to_predict = 11
A list of tracks to generate predictions for. For the challenges, exactly these objects must be predicted in each scenario for test and validation submissions. This field is populated in the training set only as a suggestion of objects to train on.
repeated CompressedFrameLaserData compressed_frame_laser_data = 12
Per time step Lidar data. This contains lidar up to the current time step such that compressed_frame_laser_data[i] corresponds to the states at timestamps_seconds[i] where i <= current_time_index. This field is not populated in all versions of the dataset.

Used in: MapFeature

repeated MapPoint polygon = 1
The polygon defining the outline of the speed bump. The polygon is assumed to be closed (i.e. a segment exists between the last point and the first point).

Used in: MapFeature

repeated int64 lane = 1
The IDs of lane features controlled by this stop sign.
optional MapPoint position = 2
The position of the stop sign.

The object states for a single object through the scenario.

Used in: Scenario

optional int32 id = 1
The unique ID of the object being tracked. The IDs start from zero and are non-negative.
optional Track.ObjectType object_type = 2
The type of object being tracked.
repeated ObjectState states = 3
The object states through the track. States include the 3D bounding boxes and velocities.

Used in: Track

TYPE_UNSET = 0
This is an invalid state that indicates an error.
TYPE_VEHICLE = 1
TYPE_PEDESTRIAN = 2
TYPE_CYCLIST = 3
TYPE_OTHER = 4

Used in: DynamicMapState, DynamicState

optional int64 lane = 1
The ID for the MapFeature corresponding to the lane controlled by this traffic signal state.
optional TrafficSignalLaneState.State state = 2
The state of the traffic signal.
optional MapPoint stop_point = 3
The stopping point along the lane controlled by the traffic signal. This is the point where dynamic objects must stop when the signal is in a stop state.

Used in: TrafficSignalLaneState

LANE_STATE_UNKNOWN = 0
LANE_STATE_ARROW_STOP = 1
States for traffic signals with arrows.
LANE_STATE_ARROW_CAUTION = 2
LANE_STATE_ARROW_GO = 3
LANE_STATE_STOP = 4
Standard round traffic signals.
LANE_STATE_CAUTION = 5
LANE_STATE_GO = 6
LANE_STATE_FLASHING_STOP = 7
Flashing light signals.
LANE_STATE_FLASHING_CAUTION = 8

4x4 row major transform matrix that tranforms 3d points from one frame to another.

Used in: CameraCalibration, CameraImage, CompressedFrameLaserData, Frame, LaserCalibration

repeated double transform = 1

Used in: keypoints.Keypoint2d

optional double x = 1
optional double y = 2

Used in: Frame, keypoints.Keypoint3d

optional double x = 1
optional double y = 2
optional double z = 3

Used in: CameraImage

optional float v_x = 1
Velocity in m/s.
optional float v_y = 2
optional float v_z = 3
optional double w_x = 4
Angular velocity in rad/s.
optional double w_y = 5
optional double w_z = 6

package waymo.open_dataset

message BoundarySegment

optional int32 lane_start_index = 1

optional int32 lane_end_index = 2

optional int64 boundary_feature_id = 3

optional RoadLine.RoadLineType boundary_type = 4

message CameraCalibration

optional CameraName.Name name = 1

repeated double intrinsic = 2

optional Transform extrinsic = 3

optional int32 width = 4

optional int32 height = 5

optional CameraCalibration.RollingShutterReadOutDirection rolling_shutter_direction = 6

enum CameraCalibration.RollingShutterReadOutDirection

UNKNOWN = 0

TOP_TO_BOTTOM = 1

LEFT_TO_RIGHT = 2

BOTTOM_TO_TOP = 3

RIGHT_TO_LEFT = 4

GLOBAL_SHUTTER = 5

message CameraImage

optional CameraName.Name name = 1

optional bytes image = 2

optional Transform pose = 3

optional Velocity velocity = 4

optional double pose_timestamp = 5

optional double shutter = 6

optional double camera_trigger_time = 7

optional double camera_readout_done_time = 8

optional CameraSegmentationLabel camera_segmentation_label = 10

message CameraLabels

optional CameraName.Name name = 1

repeated Label labels = 2

message CameraName

enum CameraName.Name

UNKNOWN = 0

FRONT = 1

FRONT_LEFT = 2

FRONT_RIGHT = 3

SIDE_LEFT = 4

SIDE_RIGHT = 5

message CameraSegmentationLabel

optional int32 panoptic_label_divisor = 1

optional bytes panoptic_label = 2

repeated CameraSegmentationLabel.InstanceIDToGlobalIDMapping instance_id_to_global_id_mapping = 3

optional string sequence_id = 4

optional bytes num_cameras_covered = 5

message CameraSegmentationLabel.InstanceIDToGlobalIDMapping

optional int32 local_instance_id = 1

optional int32 global_instance_id = 2

optional bool is_tracked = 3

message CompressedFrameLaserData

repeated CompressedLaser lasers = 1

repeated LaserCalibration laser_calibrations = 2

optional Transform pose = 3

message CompressedLaser

optional LaserName.Name name = 1

optional CompressedRangeImage ri_return1 = 2

optional CompressedRangeImage ri_return2 = 3

message CompressedRangeImage

optional bytes range_image_delta_compressed = 1

optional bytes range_image_pose_delta_compressed = 4

message Context

optional string name = 1

repeated CameraCalibration camera_calibrations = 2

repeated LaserCalibration laser_calibrations = 3

optional Context.Stats stats = 4

message Context.Stats

repeated Stats.ObjectCount laser_object_counts = 1

repeated Stats.ObjectCount camera_object_counts = 5

optional string time_of_day = 2

optional string location = 3

optional string weather = 4

message Context.Stats.ObjectCount

optional Label.Type type = 1

optional int32 count = 2

message Crosswalk

repeated MapPoint polygon = 1

message DeltaEncodedData

repeated sint64 residual = 1