Get desktop application:
View/edit binary Protocol Buffers messages
A segment of a lane with a given adjacent boundary.
Used in:
,The index into the lane's polyline where this lane boundary starts.
The index into the lane's polyline where this lane boundary ends.
The adjacent boundary feature ID of the MapFeature for the boundary. This can either be a RoadLine feature or a RoadEdge feature.
The adjacent boundary type. If the boundary is a road edge instead of a road line, this will be set to TYPE_UNKNOWN.
Used in:
1d Array of [f_u, f_v, c_u, c_v, k{1, 2}, p{1, 2}, k{3}]. Note that this intrinsic corresponds to the images after scaling. Camera model: pinhole camera. Lens distortion: Radial distortion coefficients: k1, k2, k3. Tangential distortion coefficients: p1, p2. k_{1, 2, 3}, p_{1, 2} follows the same definition as OpenCV. https://en.wikipedia.org/wiki/Distortion_(optics) https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html
Camera frame to vehicle frame.
Camera image size.
Used in:
All timestamps in this proto are represented as seconds since Unix epoch.
Used in:
JPEG image.
SDC pose.
SDC velocity at 'pose_timestamp' below. The velocity value is represented at *global* frame. With this velocity, the pose can be extrapolated. r(t+dt) = r(t) + dr/dt * dt where dr/dt = v_{x,y,z}. dR(t)/dt = W*R(t) where W = SkewSymmetric(w_{x,y,z}) This differential equation solves to: R(t) = exp(Wt)*R(0) if W is constant. When dt is small: R(t+dt) = (I+W*dt)R(t) r(t) = (x(t), y(t), z(t)) is vehicle location at t in the global frame. R(t) = Rotation Matrix (3x3) from the body frame to the global frame at t. SkewSymmetric(x,y,z) is defined as the cross-product matrix in the following: https://en.wikipedia.org/wiki/Cross_product#Conversion_to_matrix_multiplication
Timestamp of the `pose` above.
Rolling shutter params. The following explanation assumes left->right rolling shutter. Rolling shutter cameras expose and read the image column by column, offset by the read out time for each column. The desired timestamp for each column is the middle of the exposure of that column as outlined below for an image with 3 columns: ------time------> |---- exposure col 1----| read | -------|---- exposure col 2----| read | --------------|---- exposure col 3----| read | ^trigger time ^readout end time ^time for row 1 (= middle of exposure of row 1) ^time image center (= middle of exposure of middle row) Shutter duration in seconds. Exposure time per column.
Time when the sensor was triggered and when last readout finished. The difference between trigger time and readout done time includes the exposure time and the actual sensor readout time.
Panoptic segmentation labels for this camera image. NOTE: Not every image has panoptic segmentation labels.
The camera labels associated with a given camera image. This message indicates the ground truth information for the camera image recorded by the given camera. If there are no labeled objects in the image, then the labels field is empty.
Used in:
(message has no fields)
Used in:
, ,Panoptic (instance + semantic) segmentation labels for a given camera image. Associations can also be provided between each instance ID and a globally unique ID across all frames.
Used in:
The value used to separate instance_ids from different semantic classes. See the panoptic_label field for how this is used. Must be set to be greater than the maximum instance_id.
A uint16 png encoded image, with the same resolution as the corresponding camera image. Each pixel contains a panoptic segmentation label, which is computed as: semantic_class_id * panoptic_label_divisor + instance_id. We set instance_id = 0 for pixels for which there is no instance_id. NOTE: Instance IDs in this label are only consistent within this camera image. Use instance_id_to_global_id_mapping to get cross-camera consistent instance IDs.
The sequence id for this label. The above instance_id_to_global_id_mapping is only valid with other labels with the same sequence id.
A uint8 png encoded image, with the same resolution as the corresponding camera image. The value on each pixel indicates the number of cameras that overlap with this pixel. Used for the weighted Segmentation and Tracking Quality (wSTQ) metric.
A mapping between each panoptic label with an instance_id and a globally unique ID across all frames within the same sequence. This can be used to match instances across cameras and over time. i.e. instances belonging to the same object will map to the same global ID across all frames in the same sequence. NOTE: These unique IDs are not consistent with other IDs in the dataset, e.g. the bounding box IDs.
Used in:
If false, the corresponding instance will not have consistent global ids between frames.
Lidar data of a frame.
Used in:
The Lidar data for each timestamp.
Laser calibration data has the same length as that of lasers.
Poses of the SDC corresponding to the track states for each step in the scenario, similar to the one in the Frame proto.
Compressed Laser data.
Used in:
Range image is a 2d tensor. The first dimension (rows) represents pitch. The second dimension represents yaw (columns). Zlib compressed range images include: Raw range image: Raw range image with a non-empty 'range_image_pose_delta_compressed' which tells the vehicle pose of each range image cell. NOTE: 'range_image_pose_delta_compressed' is only populated for the first range image return. The second return has the exact the same range image pose as the first one.
Used in:
Zlib compressed [H, W, 4] serialized DeltaEncodedData message version which stores MatrixFloat. MatrixFloat range_image; range_image.ParseFromString(val); Inner dimensions are: * channel 0: range * channel 1: intensity * channel 2: elongation * channel 3: is in any no label zone.
Zlib compressed [H, W, 4] serialized DeltaEncodedData message version which stores MatrixFloat. To decompress (Please see the documentation for lidar delta encoding): string val = delta_encoder.decompress(range_image_pose_compressed); MatrixFloat range_image_pose; range_image_pose.ParseFromString(val); Inner dimensions are [roll, pitch, yaw, x, y, z] represents a transform from vehicle frame to global frame for every range image pixel. This is ONLY populated for the first return. The second return is assumed to have exactly the same range_image_pose_compressed. The roll, pitch and yaw are specified as 3-2-1 Euler angle rotations, meaning that rotating from the navigation to vehicle frame consists of a yaw, then pitch and finally roll rotation about the z, y and x axes respectively. All rotations use the right hand rule and are positive in the counter clockwise direction.
Used in:
A unique name that identifies the frame sequence.
Some stats for the run segment used.
Used in:
Day, Dawn/Dusk, or Night, determined from sun elevation.
Human readable location (e.g. CHD, SF) of the run segment.
Currently either Sunny or Rain.
Used in:
The number of unique objects with the type in the segment.
Used in:
The polygon defining the outline of the crosswalk. The polygon is assumed to be closed (i.e. a segment exists between the last point and the first point).
Delta Encoded data structure. The protobuf compressed mask and residual data and the compressed data is encoded via zlib: compressed_bytes = zlib.compress( metadata + data_bytes + mask_bytes + residuals_bytes) The range_image_delta_compressed and range_image_pose_delta_compressed in the CompressedRangeImage are both encoded using this method.
Used in:
The polygon defining the outline of the driveway region. The polygon is assumed to be closed (i.e. a segment exists between the last point and the first point).
The dynamic map information at a single time step.
Used in:
The traffic signal states for all observed signals at this time step.
Used in:
The timestamp associated with the dynamic feature data.
The set of traffic signal states for the associated time step.
This context is the same for all frames belong to the same driving run segment. Use context.name to identify frames belong to the same driving segment. We do not store all frames from one driving segment in one proto to avoid huge protos.
Frame start time, which is the timestamp of the first top LiDAR scan within this frame. Note that this timestamp does not correspond to the provided vehicle pose (pose).
Frame vehicle pose. Note that unlike in CameraImage, the Frame pose does not correspond to the provided timestamp (timestamp_micros). Instead, it roughly (but not exactly) corresponds to the vehicle pose in the middle of the given frame. The frame vehicle pose defines the coordinate system which the 3D laser labels are defined in.
The camera images.
The LiDAR sensor data.
Native 3D labels that correspond to the LiDAR sensor data. The 3D labels are defined w.r.t. the frame vehicle pose coordinate system (pose).
The native 3D LiDAR labels (laser_labels) projected to camera images. A projected label is the smallest image axis aligned rectangle that can cover all projected points from the 3d LiDAR label. The projected label is ignored if the projection is fully outside a camera image. The projected label is clamped to the camera image if it is partially outside.
Native 2D camera labels. Note that if a camera identified by CameraLabels.name has an entry in this field, then it has been labeled, even though it is possible that there are no labeled objects in the corresponding image, which is identified by a zero sized CameraLabels.labels.
No label zones in the *global* frame.
Map features. Only the first frame in a segment will contain map data. This field will be empty for other frames as the map is identical for all frames.
Map pose offset. This offset must be added to lidar points from this frame to compensate for pose drift and align with the map features.
Used in:
,Object ID.
Difficulty level for detection problem.
Difficulty level for tracking problem.
The total number of lidar points in this box.
The total number of top lidar points in this box.
Used if the Label is a part of `Frame.laser_labels`.
Used if the Label is a part of `Frame.camera_labels`.
Used by Lidar labels to store in which camera it is mostly visible.
Used by Lidar labels to store a camera-synchronized box corresponding to the camera indicated by `most_visible_camera_name`. Currently, the boxes are shifted to the time when the most visible camera captures the center of the box, taking into account the rolling shutter of that camera. Specifically, given the object box living at the start of the Open Dataset frame (t_frame) with center position (c) and velocity (v), we aim to find the camera capture time (t_capture), when the camera indicated by `most_visible_camera_name` captures the center of the object. To this end, we solve the rolling shutter optimization considering both ego and object motion: t_capture = image_column_to_time( camera_projection(c + v * (t_capture - t_frame), transform_vehicle(t_capture - t_ref), cam_params)), where transform_vehicle(t_capture - t_frame) is the vehicle transform from a pose reference time t_ref to t_capture considering the ego motion, and cam_params is the camera extrinsic and intrinsic parameters. We then move the label box to t_capture by updating the center of the box as follows: c_camra_synced = c + v * (t_capture - t_frame), while keeping the box dimensions and heading direction. We use the camera_synced_box as the ground truth box for the 3D Camera-Only Detection Challenge. This makes the assumption that the users provide the detection at the same time as the most visible camera captures the object center.
Information to cross reference between labels for different modalities.
Used in:
Currently only CameraLabels with class `TYPE_PEDESTRIAN` store information about associated lidar objects.
Upright box, zero pitch and roll.
Used in:
Box coordinates in vehicle frame.
Dimensions of the box. length: dim x. width: dim y. height: dim z.
The heading of the bounding box (in radians). The heading is the angle required to rotate +x to the surface normal of the box front face. It is normalized to [-pi, pi).
7-DOF 3D (a.k.a upright 3D box).
5-DOF 2D. Mostly used for laser top down representation.
Axis aligned 2D. Mostly used for image.
The difficulty level of this label. The higher the level, the harder it is.
Used in:
Used in:
Used in:
,Used in:
The speed limit for this lane.
True if the lane interpolates between two other lanes.
The polyline data for the lane. A polyline is a list of points with segments defined between consecutive points.
A list of IDs for lanes that this lane may be entered from.
A list of IDs for lanes that this lane may exit to.
The boundaries to the left of this lane. There may be different boundary types along this lane. Each BoundarySegment defines a section of the lane with a given boundary feature to the left. Note that some lanes do not have any boundaries (i.e. lane centers in intersections).
The boundaries to the right of this lane. See left_boundaries for details.
A list of neighbors to the left of this lane. Neighbor lanes include only adjacent lanes going the same direction.
A list of neighbors to the right of this lane. Neighbor lanes include only adjacent lanes going the same direction.
Type of this lane.
Used in:
Used in:
The feature ID of the neighbor lane.
The self adjacency segment. The other lane may only be a neighbor for only part of this lane. These indices define the points within this lane's polyline for which feature_id is a neighbor. If the lanes are neighbors at disjoint places (e.g., a median between them appears and then goes away) multiple neighbors will be listed. A lane change can only happen from this segment of this lane into the segment of the neighbor lane defined by neighbor_start_index and neighbor_end_index.
The neighbor adjacency segment. These indices define the valid portion of the neighbor lane's polyline where that lane is a neighbor to this lane. A lane change can only happen into this segment of the neighbor lane from the segment of this lane defined by self_start_index and self_end_index.
A list of segments within the self adjacency segment that have different boundaries between this lane and the neighbor lane. Each entry in this field contains the boundary type between this lane and the neighbor lane along with the indices into this lane's polyline where the boundary type begins and ends.
Used in:
Used in:
,If non-empty, the beam pitch (in radians) is non-uniform. When constructing a range image, this mapping is used to map from beam pitch to range image row. If this is empty, we assume a uniform distribution.
beam_inclination_{min,max} (in radians) are used to determine the mapping.
Lidar frame to vehicle frame.
'Laser' is used interchangeably with 'Lidar' in this file.
(message has no fields)
Used in:
, ,The full set of map features.
A set of dynamic states per time step. These are ordered in consecutive time steps.
Used in:
, ,A unique ID to identify this feature.
Type specific data.
Used in:
, , , , , , ,Position in meters. The origin is an arbitrary location.
Row-major matrix. Requires: data.size() = product(shape.dims()).
Used in:
Row-major matrix. Requires: data.size() = product(shape.dims()).
Used in:
,Dimensions for the Matrix messages defined below. Must not be empty. The order of entries in 'dims' matters, as it indicates the layout of the values in the tensor in-memory representation. The first entry in 'dims' is the outermost dimension used to lay out the values; the last entry is the innermost dimension. This matches the in-memory layout of row-major matrices.
Metadata used for delta encoder.
Used in:
Range image's shape information in the compressed data.
Range image quantization precision for each range image channel.
Used in:
Coordinates of the center of the object bounding box.
The dimensions of the bounding box in meters.
The yaw angle in radians of the forward direction of the bounding box (the vector from the center of the box to the middle of the front box segment) counter clockwise from the X-axis (right hand system about the Z axis). This angle is normalized to [-pi, pi).
The velocity vector in m/s. This vector direction may be slightly different from the heading of the bounding box.
False if the state data is invalid or missing.
Non-self-intersecting 2d polygons. This polygon is not necessarily convex.
Used in:
A globally unique ID.
Range image is a 2d tensor. The first dim (row) represents pitch. The second dim represents yaw. There are two types of range images: 1. Raw range image: Raw range image with a non-empty 'range_image_pose_compressed' which tells the vehicle pose of each range image cell. 2. Virtual range image: Range image with an empty 'range_image_pose_compressed'. This range image is constructed by transforming all lidar points into a fixed vehicle frame (usually the vehicle frame of the middle scan). NOTE: 'range_image_pose_compressed' is only populated for the first range image return. The second return has the exact the same range image pose as the first one.
Used in:
Zlib compressed [H, W, 4] serialized version of MatrixFloat. To decompress: string val = ZlibDecompress(range_image_compressed); MatrixFloat range_image; range_image.ParseFromString(val); Inner dimensions are: * channel 0: range * channel 1: intensity * channel 2: elongation * channel 3: is in any no label zone.
Lidar point to camera image projections. A point can be projected to multiple camera images. We pick the first two at the following order: [FRONT, FRONT_LEFT, FRONT_RIGHT, SIDE_LEFT, SIDE_RIGHT]. Zlib compressed [H, W, 6] serialized version of MatrixInt32. To decompress: string val = ZlibDecompress(camera_projection_compressed); MatrixInt32 camera_projection; camera_projection.ParseFromString(val); Inner dimensions are: * channel 0: CameraName.Name of 1st projection. Set to UNKNOWN if no projection. * channel 1: x (axis along image width) * channel 2: y (axis along image height) * channel 3: CameraName.Name of 2nd projection. Set to UNKNOWN if no projection. * channel 4: x (axis along image width) * channel 5: y (axis along image height) Note: pixel 0 corresponds to the left edge of the first pixel in the image.
Zlib compressed [H, W, 6] serialized version of MatrixFloat. To decompress: string val = ZlibDecompress(range_image_pose_compressed); MatrixFloat range_image_pose; range_image_pose.ParseFromString(val); Inner dimensions are [roll, pitch, yaw, x, y, z] represents a transform from vehicle frame to global frame for every range image pixel. This is ONLY populated for the first return. The second return is assumed to have exactly the same range_image_pose_compressed. The roll, pitch and yaw are specified as 3-2-1 Euler angle rotations, meaning that rotating from the navigation to vehicle frame consists of a yaw, then pitch and finally roll rotation about the z, y and x axes respectively. All rotations use the right hand rule and are positive in the counter clockwise direction.
Zlib compressed [H, W, 5] serialized version of MatrixFloat. To decompress: string val = ZlibDecompress(range_image_flow_compressed); MatrixFloat range_image_flow; range_image_flow.ParseFromString(val); Inner dimensions are [vx, vy, vz, pointwise class]. If the point is not annotated with scene flow information, class is set to -1. A point is not annotated if it is in a no-label zone or if its label bounding box does not have a corresponding match in the previous frame, making it infeasible to estimate the motion of the point. Otherwise, (vx, vy, vz) are velocity along (x, y, z)-axis for this point and class is set to one of the following values: -1: no-flow-label, the point has no flow information. 0: unlabeled or "background,", i.e., the point is not contained in a bounding box. 1: vehicle, i.e., the point corresponds to a vehicle label box. 2: pedestrian, i.e., the point corresponds to a pedestrian label box. 3: sign, i.e., the point corresponds to a sign label box. 4: cyclist, i.e., the point corresponds to a cyclist label box.
Zlib compressed [H, W, 2] serialized version of MatrixInt32. To decompress: string val = ZlibDecompress(segmentation_label_compressed); MatrixInt32 segmentation_label. segmentation_label.ParseFromString(val); Inner dimensions are [instance_id, semantic_class]. NOTE: 1. Only TOP LiDAR has segmentation labels. 2. Not every frame has segmentation labels. This field is not set if a frame is not labeled. 3. There can be points missing segmentation labels within a labeled frame. Their label are set to TYPE_NOT_LABELED when that happens.
Deprecated, do not use.
An object that must be predicted for the scenario.
Used in:
An index into the Scenario `tracks` field for the object to be predicted.
The difficulty level for this object.
A difficulty level for predicting a given track.
Used in:
Used in:
The type of road edge.
The polyline defining the road edge. A polyline is a list of points with segments defined between consecutive points.
Type of this road edge.
Used in:
Physical road boundary that doesn't have traffic on the other side (e.g., a curb or the k-rail on the right side of a freeway).
Physical road boundary that separates the car from other traffic (e.g. a k-rail or an island).
Used in:
The type of the lane boundary.
The polyline defining the road edge. A polyline is a list of points with segments defined between consecutive points.
Type of this road line.
Used in:
,The unique ID for this scenario.
Timestamps corresponding to the track states for each step in the scenario. The length of this field is equal to tracks[i].states_size() for all tracks i and equal to the length of the dynamic_map_states_field.
The index into timestamps_seconds for the current time. All time steps after this index are future data to be predicted. All steps before this index are history data.
Tracks for all objects in the scenario. All object tracks in all scenarios in the dataset have the same number of object states. In this way, the tracks field forms a 2 dimensional grid with objects on one axis and time on the other. Each state can be associated with a timestamp in the 'timestamps_seconds' field by its index. E.g., tracks[i].states[j] indexes the i^th agent's state at time timestamps_seconds[j].
The dynamic map states in the scenario (e.g. traffic signal states). This field has the same length as timestamps_seconds. Each entry in this field can be associated with a timestamp in the 'timestamps_seconds' field by its index. E.g., dynamic_map_states[i] indexes the dynamic map state at time timestamps_seconds[i].
The set of static map features for the scenario.
The index into the tracks field of the autonomous vehicle object.
A list of objects IDs in the scene detected to have interactive behavior. The objects in this list form an interactive group. These IDs correspond to IDs in the tracks field above.
A list of tracks to generate predictions for. For the challenges, exactly these objects must be predicted in each scenario for test and validation submissions. This field is populated in the training set only as a suggestion of objects to train on.
Per time step Lidar data. This contains lidar up to the current time step such that compressed_frame_laser_data[i] corresponds to the states at timestamps_seconds[i] where i <= current_time_index. This field is not populated in all versions of the dataset.
Used in:
The polygon defining the outline of the speed bump. The polygon is assumed to be closed (i.e. a segment exists between the last point and the first point).
Used in:
The IDs of lane features controlled by this stop sign.
The position of the stop sign.
The object states for a single object through the scenario.
Used in:
The unique ID of the object being tracked. The IDs start from zero and are non-negative.
The type of object being tracked.
The object states through the track. States include the 3D bounding boxes and velocities.
Used in:
This is an invalid state that indicates an error.
Used in:
,The ID for the MapFeature corresponding to the lane controlled by this traffic signal state.
The state of the traffic signal.
The stopping point along the lane controlled by the traffic signal. This is the point where dynamic objects must stop when the signal is in a stop state.
Used in:
States for traffic signals with arrows.
Standard round traffic signals.
Flashing light signals.
4x4 row major transform matrix that tranforms 3d points from one frame to another.
Used in:
, , , ,Used in:
Used in:
,Used in:
Velocity in m/s.
Angular velocity in rad/s.