Get desktop application:
View/edit binary Protocol Buffers messages
Affine according to ( [a b * x + [dx; ( c d] dy]
Used in: ,
Transforms a 3D color vector x = (c1, c2, c3) according to [ g_00 g_01 g_02 g_03 * [ c1 g_10 g_11 g_12 g_13 c2 g_20 g_21 g_22 g_23 ] c3 1 ]
Used in: ,
The anchor representation for object detection.
Encoded anchor box center.
Encoded anchor box height.
Encoded anchor box width.
Options for the AnnotationOverlayCalculator.
The canvas width and height in pixels, and the background color. These options are used only if an input stream of ImageFrame isn't provided to the renderer calculator. If an input stream of ImageFrame is provided, then the calculator renders the annotations on top of the provided image, else a canvas is created with the dimensions and background color specified in these options and the annotations are rendered on top of this canvas.
Whether text should be rendered upside down. When it's set to false, text is rendered normally assuming the underlying image has its origin at the top-left corner. Therefore, for images with the origin at the bottom-left corner this should be set to true.
Whether input stream IMAGE_GPU (OpenGL texture) has bottom-left or top-left origin. (Historically, OpenGL uses bottom left origin, but most MediaPipe examples expect textures to have top-left origin.)
Scale factor for intermediate image for GPU rendering. This can be used to speed up annotation by drawing the annotation on an intermediate image with a reduced scale, e.g. 0.5 (of the input image width and height), before resizing and overlaying it on top of the input image.
The start time in seconds to decode.
The end time in seconds to decode (inclusive).
Used in:
The stream to decode. Stream indexes start from 0 (audio and video are handled separately).
Process the file despite this stream not being present.
If true, failures to decode a frame of data will be ignored.
Output packets with regressing timestamps. By default those packets are dropped.
MPEG PTS timestamps roll over back to 0 after 26.5h. If this flag is set we detect any rollover and continue incrementing timestamps past this point. Set this flag if you want non-regressing timestamps for MPEG content where the PTS may roll over.
Max variance in color allowed, based on normalized color values.
Window radius. Results in a '(sigma_space*2+1) x (sigma_space*2+1)' size kernel. This should be set based on output image pixel space.
Binary feature descriptor for a particular feature. For example: orb http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.370.4395&rep=rep1&type=pdf
Used in: , ,
TrackingData in compressed binary format. Obtainable via FlowPackager::EncodeTrackingData. Details of binary encode are below.
TrackingContainer::header = "TRAK"
Used in:
A representation of a bounding box.
Used in:
File path to the template index files.
Proto to hold BoxDetector's internal search index.
Message to hold keypoints and descriptors for each box.
Used in:
Message to hold keypoints and descriptors for each appearance. One box could have multiple appearances to account for shape and perspective change, etc..
Used in:
Used in:
Decide whether we force detector run every N frame. 0 means detection will never be called. 1 means detect every frame. 2 means detect every other frame. etc.. Currently only applied to image query mode.
Enable box detection when tracked boxes is out of FOV. Detection will be ceased after the detector successfully re-acquire the box.
Options for detection function with image query.
Dimensions (number of elements) for feature descriptor.
Minimum number of correspondence to go through RANSAC.
Reprojection threshold for RANSAC to find inliers.
Max distance to match 2 NIMBY features.
Max persepective change factor.
Options only for detection from image queries.
Used in:
Resize the input image's longer edge to this size. Skip resizing if the input size is already smaller than this size.
Scale factor between adjacent pyramid levels.
Maximum number of pyramid levels.
Max number of features the detector uses.
Available types of detector's index and search structure.
Used in:
BFMatcher from OpenCV
Initial position to be tracked. Can also be supplied as side packet or as input stream.
If set and VIZ stream is present, renders tracking data into the visualization.
If set and VIZ stream is present, renders the box state into the visualization.
If set and VIZ stream is present, renders the internal box state into the visualization.
Size of the track data cache during streaming mode. This allows to buffer track_data's for fast forward tracking, i.e. any TimedBox received via input stream START_POS can be tracked towards the current track head (i.e. last received TrackingData). Measured in number of frames.
Add a transition period of N frames to smooth the jump from original tracking to reset start pos with motion compensation. The transition will be a linear decay of original tracking result. 0 means no transition.
Used in:
Chunk size for caching files. Should be equal to those written by the FlowPackagerCalculator.
Chunk file format.
Number of simultaneous tracking requests.
Maximum waiting time for next chunk, till function times out.
If set, box tracker will record the state for each computed TimedBox across all paths.
Actual tracking options to be used for every step.
Describes the topology and function of a MediaPipe Graph. The graph of Nodes must be a Directed Acyclic Graph (DAG) except as annotated by "back_edge" in InputStreamInfo. Use a mediapipe::CalculatorGraph object to run the graph.
Used in: ,
The nodes.
Create a side packet using a PacketFactory. This side packet is created as close to the worker that does the work as possible. A PacketFactory is basically a PacketGenerator that takes no input side packets and produces a single output side packet.
Configs for PacketGenerators. Generators take zero or more input side packets and produce any number of output side packets. For example, MediaDecoderCalculator takes an input side packet with type DeletingFile. However, most users want to specify videos by ContentIdHex (i.e. video id). By using the VideoIdToLocalFileGenerator, a user can specify a video id (as a string) and obtain a DeletingFile to use with the decoder. PacketGenerators can take as a input side packet the output side packet of another PacketGenerator. The graph of PacketGenerators must be a directed acyclic graph.
Number of threads for running calculators in multithreaded mode. If not specified, the scheduler will pick an appropriate number of threads depending on the number of available processors. To run on the calling thread, specify "ApplicationThreadExecutor" see: http://g3doc/mediapipe/g3doc/running.md.
Configs for StatusHandlers that will be called after each call to Run() on the graph. StatusHandlers take zero or more input side packets and the absl::Status returned by a graph run. For example, a StatusHandler could store information about graph failures and their causes for later monitoring. Note that graph failures during initialization may cause required input side packets (created by a PacketFactory or PacketGenerator) to be missing. In these cases, the handler with missing input side packets will be skipped.
Specify input streams to the entire graph. Streams specified here may have packets added to them using CalculatorGraph::AddPacketToInputStream. This works much like a source calculator, except that the source is outside of the mediapipe graph.
Output streams for the graph when used as a subgraph.
Input side packets for the graph when used as a subgraph.
Output side packets for the graph when used as a subgraph.
Maximum queue size of any input stream in the graph. This can be used to control the memory usage of a MediaPipe graph by preventing fast sources from flooding the graph with packets. Any source that is connected to an input stream that has hit its maximum capacity will not be scheduled until the queue size falls under the specified limits, or if the scheduler queue is empty and no other nodes are running (to prevent possible deadlocks due to a incorrectly specified value). This global parameter is set to 100 packets by default to enable pipelining. If any node indicates that it buffers packets before emitting them, then the max(node_buffer_size, max_queue_size) is used. Set this parameter to -1 to disable throttling (i.e. the graph will use as much memory as it requires). If not specified, the limit is 100 packets.
If true, the graph run fails with an error when throttling prevents all calculators from running. If false, max_queue_size for an input stream is adjusted when throttling prevents all calculators from running.
Config for this graph's InputStreamHandler. If unspecified, the framework will automatically install the default handler, which works as follows. The calculator's Process() method is called for timestamp t when: - at least one stream has a packet available at t; and, - all other streams either have packets at t, or it is known that they will not have packets at t (i.e. their next timestamp bound is greater than t). The handler then provides all available packets with timestamp t, with no preprocessing.
Config for this graph's OutputStreamHandler. If unspecified, the default output stream handler will be automatically installed by the framework which does not modify any outgoing packets.
Configs for Executors. The names of the executors must be distinct. The default executor, whose name is the empty string, is predefined. The num_threads field of the CalculatorGraphConfig specifies the number of threads in the default executor. If the config for the default executor is specified, the CalculatorGraphConfig must not have the num_threads field.
The default profiler-config for all calculators. If set, this defines the profiling settings such as num_histogram_intervals for every calculator in the graph. Each of these settings can be overridden by the |profiler_config| specified for a node.
The namespace used for class name lookup within this graph. An unqualified or partially qualified class name is looked up in this namespace first and then in enclosing namespaces.
The type name for the graph config, used for registering and referencing the graph config.
The types and default values for graph options, in proto2 syntax.
The types and default values for graph options, in proto3 syntax.
A single node in the DAG.
Used in: ,
The name of the node. This field is optional and doesn't generally need to be specified, but does improve error messaging.
The registered type of a calculator (provided via REGISTER_CALCULATOR), or of a subgraph (via REGISTER_MEDIAPIPE_GRAPH).
A Calculator can choose to access its input streams, output streams, and input side packets either by tag or by index. If the calculator chooses indexes then it will receive the streams or side packets in the same order as they are specified in this proto. If the calculator chooses to use tags then it must specify a tag along with each name. The field is given as "TAG:name". Meaning a tag name followed by a colon followed by the name. Tags use only upper case letters, numbers, and underscores, whereas names use only lower case letters, numbers, and underscores. Example: Node { calculator: "SomeAudioVideoCalculator" # This calculator accesses its inputs by index (no tag needed). input_stream: "combined_input" # This calculator accesses its outputs by tags, so all # output_streams must specify a tag. output_stream: "AUDIO:audio_stream" output_stream: "VIDEO:video_stream" # This calculator accesses its input side packets by tag. input_side_packet: "MODEL:model_01" }
String(s) representing "TAG:name" of the stream(s) from which the current node will get its inputs. "TAG:" part is optional, see above. A calculator with no input stream is a source.
String(s) representing "TAG:name" of the stream(s) produced by this node. "TAG:" part is optional, see above. These must be different from any other output_streams specified for other nodes in the graph.
String(s) representing "TAG:name" of the input side packet(s). "TAG:" part is optional, see above.
String(s) representing "TAG:name" of the output side packet(s). Only used by subgraphs. "TAG:" part is optional, see above.
The options passed to the Calculator, in proto2 syntax.
The options passed to the Calculator, in proto3 syntax. Each node_options message must have a different message type. If the same message type is specified in |options| and |node_options|, only the message in |options| is used.
For a Source Calculator (i.e. a calculator with no inputs), this is the "layer" on which the calculator is executed. For a non-source calculator (i.e. a calculator with one or more input streams) this field has no effect. The sources on each layer are completely exhausted before Process() is called on any source calculator on a higher numbered layer. Example: Decoder -> Median Frame (requires all frames) -> Image Subtraction ---------------------------------------> The entire video will be buffered on the edge from the decoder to the Image subtraction. To fix this problem, layers can be used. Decoder (layer 0) -> Median Frame -> Image Subtraction Decoder (layer 1) -----------------> The frames from layer 0 will no longer be buffered, but the video will be decoded again instead. Note, that different options can be used in the second decoder.
Optional parameter that allows the user to indicate to the scheduler that this node has a buffering behavior (i.e. waits for a bunch of packets before emitting any) and specify the size of the buffer that is built up. The scheduler will then try to keep the maximum size of any input queues in the graph to remain below the maximum of all buffer_size_hints and max_queue_size (if specified). The ideal value is typically something larger than the actual number of buffered packets to maintain pipelining. The default value 0 indicates that the node has no buffering behavior.
Config for this node's InputStreamHandler. If unspecified, the graph-level input stream handler will be used.
Config for this node's OutputStreamHandler. If unspecified, the graph-level output stream handler will be used.
Additional information about an input stream. The |name| field of the InputStreamInfo must match an input_stream.
Set the executor which the calculator will execute on.
TODO: Remove from Node when switched to Profiler. DEPRECATED: Configs for the profiler.
The maximum number of invocations that can be executed in parallel. If not specified, the limit is one invocation.
DEPRECATED: For backwards compatibility we allow users to specify the old name for "input_side_packet" in proto configs. These are automatically converted to input_side_packets during config canonicalization.
A protobuf extension defining a list of template rules.
The base configuration.
The list of template rules.
Options for Calculators. Each Calculator implementation should have its own options proto, which should look like this: message MyCalculatorOptions { extend CalculatorOptions { optional MyCalculatorOptions ext = <unique id, e.g. the CL#>; } optional string field_needed_by_my_calculator = 1; optional int32 another_field = 2; // etc }
Used in:
If true, this proto specifies a subset of field values, which should override corresponding field values.
Stores the profiling information for a calculator node. All the times are in microseconds.
Used in:
The calculator name.
Total time the calculator spent on Open (in microseconds).
Total time the calculator spent on Close (in microseconds).
Total and histogram of the time that the calculator spent on the Process() (in microseconds).
Total and histogram of the time that the input latency, ie. difference between input timestamp and process call time. (in microseconds).
Total and histogram of the time that the output latency, ie. difference between input timestamp and process finished time.
Total and histogram of the time that input streams of this calculator took.
The type of the data pointer that the callback will put data into.
The location of the data stored as a string printed with snprintf(address, sizeof(address), "%p", pointer). This calculator only produces a reasonable callback if it is constructed on the same machine as the original pointer was created on and that pointer is still alive.
Used in:
Next tag: 33
Used in:
Background motion expressed in various models. These are per-frame pair motions (from current to previous frame). Models are expressed in the un-normalized domain frame_width x frame_height that is passed to MotionEstimation (storred below).
Frame dimensions camera motion was computed over.
Mixture homographies computed w.r.t. exponentially increasing regularizers. Above mixture_homography member is selected from spectrum based on amount of rolling shutter present in the video.
Relative row sigma w.r.t. frame_height for mixture models.
Average of all motion vector magnitudes (without accounting for any motion model), within 10th to 90th percentile (to remove outliers).
Inlier-weighted variance of the translation model. Specified, w.r.t. unnormalized video domain that motion models are computed for.
Ratio of inliers w.r.t. regular and stricter thresholds. In [0, 1].
Average registration error of homography in pixels. Note: These two parameters default to zero in-case homographies have not been estimated.
Fraction, in [0,1], of homography inliers.
Same as above but with stricter threshold. (For details, see: MotionEstimationOptions::strict_coverage_scale). Coverage is designed to measure the amount of significant outliers, which can affect the validity of the estimated homography. However, it does not discount small outliers, which occur in case of small rolling shutter wobbles. For this a stricter version of coverage is needed, which is essential for computing the rolling_shutter_guess, i.e. the increase in coverage by using mixtures vs. homographies.
Per-block inlier fraction for mixtures.
Set based on stability analysis indicating if frame is likely to originate from a rolling shutter camera. (-1 is used to indicate frame was not tested, e.g. due to mixture deemed unstable for analysis). Guess is a scaler indicating by how much the mixture models (suitable for rolling shutter distortions) increased inlier coverage compared to a single homography. For example a value, of 1.3 indicates, that the mixture models increased inlier coverage by 30%. If not -1, range is in [0, inf] (values slightly smaller than 1 are possible due to suppression of noisy feature tracks during estimation).
Indicating if CameraMotion is deemed to originate from rolling shutter camera (index >= 0), and if so, denotes the index in the mixture_homography_spectrum, where higher indices correspond to heavier regularized motions. If motion is not deemed to originate from a rolling shutter camera, index is set to -1.
List of overlay indices (cell locations in column major format) over domain of size overlay_domain x overlay_domain, where overlay_domain is set by MotionEstimation to MotionEstimationOptions::OverlayDetectionOptions::analysis_mask_size. Overlay analysis is performed over chunk of frames, as specified by MotionEstimationOptions::overlay_analysis_chunk_size, with the resulting overlay indices being assigned to each frame of the chunk. Consequently it suffices to store the result only for the first frame of every chunk. Subsequent frames store a single negative index relative to the first chunk frame indicating where to locate the overlay indicies. Specifically if for frame f, overlay_indices(0) == -2, overlay indices for corresponding chunk can be found at frame f - 2. For details about how overlay indices are used to flag a frame to contain an overlay, see MotionFilterOptions::OverlayOptions.
If set, stores original type in case it was overriden (by filtering functions, etc.).
Same as in RegionFlowFeatureList (from region_flow.proto), measures blur as average cornerness over textured areas. As it depends on the image content, should only be used relative.
Quanitifies amount of blur. Specified as ratio w.r.t. sharpest matching frame, i.e. 1 indicates no blur, values > 1 amount of blur w.r.t. sharpest frame.
Same as in RegionFlowFeatureList (from region_flow.proto). Stores fraction of long feature tracks that got rejected for this frame.
Same as in RegionFlowFeatureList (from region_flow.proto). Timestamp in micro seconds of the underlying frame.
Same as in RegionFlowFeatureList (from region_flow.proto). Denotes frame that motion was computed w.r.t. to, locally to the current frame. Values < 0 indicate backward tracking, while values > 0 indicate forward tracking. For example, match_frame = -1, indicates tracking is from current to previous frame.
Set of optional *bit* flags set for various purposes.
Set to indicate presence of a
shot boundary.
Set if frame is considered sharp
in a neighborhood of frames.
Indicates that estimation resulted
in singular optimization problem. Used internally by MotionEstimation. Indicates if shot boundary is part of a fade. If so, all frames of the fade will be labeled with the FLAG but only the begin and end of the fade will have the FLAG_SHOT_BOUNDARY set.
Set if frame is exact duplicate of
previous frame.
Indicates this frame is at the
CameraMotion type indicates whether highest degree of freedom (DOF) model estimation was deemed stable, in which case CameraMotion::Type is set to VALID. If a model was deemed not stable (according to *StabilityBounds in MotionEstimationOptions), it is set to the lower dof type which was deemed stable.
Used in:
All requested motion models estimated reliably.
Fallback to homographies, mixture unreliable.
Fallback to similarity model, homography
unreliable.
Fallback to translation model, similarity
unreliable, legacy naming.
Identity model, translation unreliable.
Used in:
The index of the class in the corresponding label map.
The probability score for this class.
Label or name of the class.
Optional human-readable string for display purposes.
Group of Classification protos.
Used in:
Used in:
Over/Under exposure setting. Pixels that are clipped due to limited dynamic range are masked out from analysis. Values specified w.r.t. [0, 1] range.
A pixel can have clipped color values in atmost max_clipped_channels before it will be labeled as clipped.
Over-exposure tends to show blooming (neighboring pixels are affected by over-exposure as well). For robustness mask of clipped pixels is dilated with structuring element of diameter clip_mask_diam.
The minimum size an input iterable collection should have for the calculator to output true.
Used in: , , , , , , , , , , , ,
Mapping from string label to a color.
Used in:
Used in:
rotate 90 degrees counterclockwise
hack to rectify convfloat
See DefaultInputStreamHandler for documentation.
batch_size determines how many input packets should be collected before a calculator can process them. Once there are enough packets, Process method of the Calculator is called sequentially. Currently, batching is not supported for source nodes but it may be supported in the future. Therefore, this field should not be specified for source nodes.
Used in:
i-th label or label_id has a score encoded by the i-th element in score.
Location data corresponding to all detected labels above.
Optional string to indicate the feature generation method. Useful in associating a name to the pipeline used to generate this detection.
Optional string to specify track_id if detection is part of a track.
Optional unique id to help associate different Detections to each other.
Human-readable string for display, intended for debugging purposes. The display name corresponds to the label (or label_id). This is optional.
The timestamp (in microseconds) *at which* this detection was created/detected.
Useful for associating a detection with other detections based on the detection_id. For example, this could be used to associate a face detection with a body detection when they belong to the same person.
Used in:
Path to a label map file for getting the actual name of detected classes.
Alternative way to specify label map label: "label for id 0" label: "label for id 1" ...
By default, the `label_id` field from the input is stripped if a text label could be found. By setting this field to true, it is always copied to the output detections.
Group of Detection protos.
Specify the rotation angle of the output rect with a vector formed by connecting two keypoints in the detection, together with the target angle (can be in radians or in degrees) of that vector after rotation. The target angle is counter-clockwise starting from the positive x-axis.
In radians.
In degrees.
Whether to output a zero-rect (with origin and size both zero) when the input detection vector is empty.
Used in:
If true, produces a RenderData packet with no annotation when the input packet has no detection. Otherwise, it won't produce any packet. Please note, regardless of this flag nothing will be produce if there is no input packet for a timestamp.
The delimiter to separate label(_id) and score.
If true, each "label(_id),score" will be on a separate line. Otherwise, all "label(_id),score" will be concatenated when the detection has more than one label.
Rendering options for the label.
Thickness for drawing the label(s) and the location_data(box).
Color for drawing the label(s), feature_tag, and the location_data(box).
An optional string that identifies this class of annotations for the render data output this calculator produces. If multiple instances of this calculator are present in the graph, this value should be unique among them.
If true, renders the detection id in the first line before the labels.
Describes a MediaPipe Executor.
Used in:
The name of the executor (used by a CalculatorGraphConfig::Node or PacketGeneratorConfig to specify which executor it will execute on). This field must be unique within a CalculatorGraphConfig. If this field is omitted or is an empty string, the ExecutorConfig describes the default executor. NOTE: The names "default" and "gpu" are reserved and must not be used.
The registered type of the executor. For example: "ThreadPoolExecutor". The framework will create an executor of this type (with the options in the options field) for the CalculatorGraph. The ExecutorConfig for the default executor may omit this field and let the framework choose an appropriate executor type. Note: If the options field is used in this case, it should contain the ThreadPoolExecutorOptions. If the ExecutorConfig for an additional (non-default) executor omits this field, the executor must be created outside the CalculatorGraph and passed to the CalculatorGraph for use.
The options passed to the Executor. The extension in the options field must match the type field. For example, if the type field is "ThreadPoolExecutor", then the options field should contain the ThreadPoolExecutorOptions.
Describes a field within a message.
(message has no fields)
Used in:
0 is reserved for errors.
Order is weird for historical reasons.
Not ZigZag encoded. Negative numbers take 10 bytes. Use TYPE_SINT64 if negative values are likely.
Not ZigZag encoded. Negative numbers take 10 bytes. Use TYPE_SINT32 if negative values are likely.
Tag-delimited aggregate. Group type is deprecated and not supported in proto3. However, Proto3 implementations should still be able to parse the group wire format and treat group fields as unknown fields.
Length-delimited aggregate.
New in version 2.
Uses ZigZag encoding.
Uses ZigZag encoding.
See FixedSizeInputStreamHandler for documentation.
The queue size at which input queues are truncated.
The queue size to which input queues are truncated.
If false, input queues are truncated to at most trigger_queue_size. If true, input queues are truncated to at least trigger_queue_size.
The maximum number of frames released for processing at one time. The default value limits to 1 frame processing at a time.
The maximum number of frames queued waiting for processing. The default value limits to 1 frame awaiting processing.
The maximum time in microseconds to wait for a frame to finish processing. The default value stops waiting after 1 sec. The value 0 specifies no timeout.
Chunk size for caching files that are written to the externally specified caching directory. Specified in msec. Note that each chunk always contains at its end the first frame of the next chunk (to enable forward tracking across chunk boundaries).
Options controlling compression and encoding.
Used in:
Tracking data is resolution independent specified w.r.t. specified domain. Only values <= 256 are supported if binary tracking data is requested to be supported (see below).
Needs to be set for calls to FlowPackager::EncodeTrackingData. If encoding is not required, can be set to false in which case a higher domain_width can be used.
If set uses 16 bit encode for vector data, in BinaryTrackingData, otherwise only 8 bits are used.
In high profile encode, re-use previously encoded vector when absolute difference to current vector is below threshold.
High profile encoding flags.
Specifies the maximum and minimum value to truncate when normalize optical flow fields.
Next index: 7
Used in:
Interval at which frames should be sampled; set to zero if sampling should not be enforced (i.e. selection is performed w.r.t. other criteria).
Bandwidth used during dynamic programming. The larger the bandwidth the more accurate the result w.r.t. the specified sampling rate. Smaller bandwidth's bias the solution suboptimally to center around the mean frame numbers of the sampling rate. If in (0, 1), assumed to specify fraction of total number of input frames, otherwise must be an integer.
Search radius for dynamic programming (how many frames you are allowed to search around the previous frame).
Allows one to specify custom solution selection criteria (i.e. different way to choose the best row of the computed cost matrix).
Outputs a fixed number of frames and automatically sets the appropriate sampling rate. Set to 0 by default (i.e. not enabled).
Options for computing frame selection. TODO: Support multiple criteria if required. Currently uses only the first one.
FrameSelection buffers incoming CameraMotions for specified chunk size and creates cost matrices upon reaching the limit. TODO: Implement if necessary (currently nothing is cleared upon reaching the limit).
Stores the result of the frame selection, with composited features. Next index: 6
Timestamp of the selected frame.
Frame index of the selected frame in the initial video stream. If this timestamp was manufactured, this will be the index of the initial frame.
CameraMotion from selected item to previous selected item.
Features from selected item to previous selected item.
If this FrameSelectionResult was the result of processing a previous one, the timestamp of the original frame.
Used in:
(message has no fields)
Used in:
Class of type FrameSelectionSolution that computes the best row.
Stores selected timestamps and corresponding frame index.
Timestamp of the selected frame.
Frame index of the selected frame in the initial video stream. If this timestamp was manufactured, this will be the index of the initial frame.
If this timestamp was manufactured, the timestamp of the original frame.
Transforms a 3D color vector x = (c1, c2, c3) according to [ gain_c1 0 0 bias_c1 * [ c1 0 gain_c2 0 bias_c2 c2 0 0 gain_c3 bias_c3 ] c3 1 ]
Used in: ,
By default an empty packet in the ALLOW or DISALLOW input stream indicates disallowing the corresponding packets in the data input streams. Setting this option to true inverts that, allowing the data packets to go through.
Next id: 8.
Output dimensions.
A scale factor for output size, while keeping aspect ratio. It has lower priority than the above two fields. That is, it is effective only when the above two fields are unset.
Counterclockwise rotation in degrees. Must be a multiple of 90.
Flip the output texture vertically. This is applied after rotation.
Flip the output texture horizontally. This is applied after rotation.
Output frame scale mode. Default is FILL_AND_CROP.
(message has no fields)
Used in: ,
OpenGL: bottom-left origin Metal : top-left origin
OpenGL: top-left origin Metal : top-left origin
Latency events and summaries for recent mediapipe packets.
Recent packet timing informtion about each calculator node and stream.
Aggregated latency information about each calculator node.
The canonicalized calculator graph that is traced.
Latency timing for recent mediapipe packets.
Used in:
The time represented as 0 in the trace.
The timestamp represented as 0 in the trace.
The list of calculator node names indexed by node id.
The list of stream names indexed by stream id.
Recent packet timing informtion about each calculator node and stream.
The timing for one packet set being processed at one caclulator node.
Used in:
The index of the calculator node in the calculator_name list.
The input timestamp during Open, Process, or Close.
The kind of event, 1=Open, 2=Process, 3=Close, etc.
The time at which the packets entered the caclulator node.
The time at which the packets exited the caclulator node.
The timing data for each input packet.
The identifying timetamp and stream_id for each output packet.
An identifier for the current process thread.
The kind of event recorded.
Used in:
The timing for one packet across one packet stream.
Used in:
The time at which the packet entered the stream.
The time at which the packet exited the stream.
The identifying timetamp of the packet.
The index of the stream in the stream_name list.
The address of the packet contents.
Data describing the event, such as the packet contents.
Homography according to [h_00 h_01 h_02; h_10 h_11 h_12; h_20 h_21 1]; Note: The parametrization with h_22 = 1 does not always hold, e.g. if the origin (0, 0, 1) gets mapped to the line at infinity (0, 0, 1). However for video we expect small perspective changes between frames and this parametrization improves robustness greatly as it removes an additional DOF. Therefore, all methods in motion_stabilization should not be used for general wide-baseline matching of frames.
Used in: , , ,
Taken from java/com/google/android/libraries/microvideo/proto/microvideo.proto to satisfy leakr requirements TODO: Remove and use above proto.
For each frame, there are 12 homography matrices stored. Each matrix is 3x3 (9 elements). This field will contain 12 x 3 x 3 float values. The first row of the first homography matrix will be followed by the second row of the first homography matrix, followed by third row of first homography matrix, followed by the first row of the second homography matrix, etc.
Vector containing histogram counts for individual patches in the frame.
The width of the frame at the time metadata was sampled.
The height of the frame at the time metadata was sampled.
Whether the output clone should have pixel data already available on GPU.
Output texture buffer dimensions. The values defined in the options will be overriden by the WIDTH and HEIGHT input streams if they exist.
Rotation angle is counter-clockwise in radian.
Normalized width and height of the output rect. Value is within [0, 1].
Normalized location of the center of the output rectangle in image coordinates. Value is within [0, 1]. The (0, 0) point is at the (top, left) corner.
Specifies behaviour for crops that go beyond image borders.
Specifies limits for the size of the output image. It will be scaled down, preserving ratio, to fit within. These do not change which area of the input is selected for cropping.
Used in:
First unspecified value is required by the guideline. See details here: https://developers.google.com/protocol-buffers/docs/style#enums
A list of properties extracted from EXIF metadata from an image file.
Image dimensions.
Focal length of camera lens in millimeters.
Focal length of camera lens in 35 mm equivalent.
Focal length in pixels.
(message has no fields)
Used in:
The format is unknown. It is not valid for an ImageFrame to be initialized with this value.
sRGB, interleaved: one byte for R, then one byte for G, then one byte for B for each pixel.
sRGBA, interleaved: one byte for R, one byte for G, one byte for B, one byte for alpha or unused.
Grayscale, one byte per pixel.
Grayscale, one uint16 per pixel.
YCbCr420P (1 bpp for Y, 0.25 bpp for U and V). NOTE: NOT a valid ImageFrame format, but intended for ScaleImageCalculatorOptions, VideoHeader, etc. to indicate that YUVImage is used in place of ImageFrame.
Similar to YCbCr420P, but the data is represented as the lower 10bits of a uint16. Like YCbCr420P, this is NOT a valid ImageFrame, and the data is carried within a YUVImage.
sRGB, interleaved, each component is a uint16.
sRGBA, interleaved, each component is a uint16.
One float per pixel.
Two floats per pixel.
LAB, interleaved: one byte for L, then one byte for a, then one byte for b for each pixel.
sBGRA, interleaved: one byte for B, one byte for G, one byte for R, one byte for alpha or unused. This is the N32 format for Skia.
If true, image region will be extracted and copied into tensor keeping region aspect ratio, which usually results in letterbox padding. Otherwise, if false, image region is stretched to fill output tensor fully.
Output tensor element range/type image pixels are converted to.
For CONVENTIONAL mode for OpenGL, input image starts at bottom and needs to be flipped vertically as tensors are expected to start at top. (DEFAULT or unset interpreted as CONVENTIONAL.)
Pixel extrapolation method. When converting image to tensor it may happen that tensor needs to read pixels outside image boundaries. Border mode helps to specify how such pixels will be calculated. BORDER_REPLICATE is used by default.
Pixel extrapolation methods. See @border_mode.
Used in:
Range of float values [min, max]. min, must be strictly less than max.
Used in:
Output dimensions. Set to 0 if they should be the same as the input.
Counterclockwise rotation mode.
Vertical flipping, applied after rotation.
Horizontal flipping, applied after rotation.
Scale mode.
Padding type. This option is only used when the scale mode is FIT. Default is to use BORDER_CONSTANT. If set to false, it will use BORDER_REPLICATE instead.
Full Example: node { calculator: "InferenceCalculator" input_stream: "TENSOR_IN:image_tensors" output_stream: "TENSOR_OUT:result_tensors" options { [mediapipe.InferenceCalculatorOptions.ext] { model_path: "model.tflite" delegate { gpu {} } } } }
Path to the TF Lite model (ex: /path/to/modelname.tflite). On mobile, this is generally just modelname.tflite.
Whether the TF Lite GPU or CPU backend should be used. Effective only when input tensors are on CPU. For input tensors on GPU, GPU backend is always used. DEPRECATED: configure "delegate" instead.
Android only. When true, an NNAPI delegate will be used for inference. If NNAPI is not available, then the default CPU delegate will be used automatically. DEPRECATED: configure "delegate" instead.
The number of threads available to the interpreter. Effective only when input tensors are on CPU and 'use_gpu' is false.
TfLite delegate to run inference. If not specified, TFLite GPU delegate is used by default (as if "gpu {}" is specified) unless GPU support is disabled in the build (i.e., with --define MEDIAPIPE_DISABLE_GPU=1), in which case regular TFLite on CPU is used (as if "tflite {}" is specified) except when building with emscripten where xnnpack is used. NOTE: use_gpu/use_nnapi are ignored if specified. (Delegate takes precedence over use_* deprecated options.)
Used in:
Delegate to run GPU inference depending on the device. (Can use OpenGl, OpenCl, Metal depending on the device.)
Used in:
Experimental, Android/Linux only. Use TFLite GPU delegate API2 for the NN inference. example: delegate: { gpu { use_advanced_gpu_api: true } }
This option is valid for TFLite GPU delegate API2 only, Set to true to use 16-bit float precision. If max precision is needed, set to false for 32-bit float calculations only.
Load pre-compiled serialized binary cache to accelerate init process. Only available for OpenCL delegate on Android. Kernel caching will only be enabled if this path is set.
This option is valid for TFLite GPU delegate API2 only, Choose any of available APIs to force running inference using it.
Used in:
Android only.
Used in:
(message has no fields)
Default inference provided by tflite.
Used in:
(message has no fields)
Used in:
Number of threads for XNNPACK delegate. (By default, calculator tries to choose optimal number of threads depending on the device.)
A collection of input data to a CalculatorGraph.
Used in:
The name of the input collection. Name must match [a-z_][a-z0-9_]*
The names of each side packet. The number of side_packet_name must match the number of packets generated by the input file.
DEPRECATED: old way of referring to side_packet_name.
Sets the source of the input collection data. The default value is UNKNOWN.
A file name pointing to the data. The format of the data is specified by the "input_type" field. Multiple shards may be specified using @N or glob expressions.
The input can be specified in several ways.
Used in:
An invalid default value. This value is guaranteed to be the lowest enum value (i.e. don't add negative enum values).
A recordio where each record is a serialized PacketManagerConfig. Each PacketManagerConfig must have the same number of packet factories in it as the number of side packet names. Furthermore, the output side packet name field in each PacketFactoryConfig must not be set. This is the most general input, and allows multiple side packet values to be set in arbitrarily complicated ways before each run.
A recordio where each record is a serialized packet payload. For example a recordio of serialized OmniaFeature protos dumped from Omnia.
A text file where each line is a comma separated list. The number of elements for each csv string must be the same as the number of side_packet_name (and the order must match). Each line must be less than 1MiB in size. Lines comprising of only whitespace or only whitespace and a pound comment will be skipped.
This and all higher values are invalid. Update this value to always be larger than any other enum values you add.
A convenient way to specify a number of InputCollections.
This proto should be used only as an input to a calculator, to verify that that case is covered.
Settings specifying an input stream handler.
Used in: ,
Name of the registered input stream handler class.
Options for the input stream handler.
Additional information about an input stream.
Used in:
A description of the input stream. This description uses the Calculator visible specification of a stream. The format is a tag, then an index with both being optional. If the tag is missing it is assumed to be "" and if the index is missing then it is assumed to be 0. If the index is provided then a colon (':') must be used. Examples: "TAG" -> tag "TAG", index 0 "" -> tag "", index 0 ":0" -> tag "", index 0 ":3" -> tag "", index 3 "VIDEO:0" -> tag "VIDEO", index 0 "VIDEO:2" -> tag "VIDEO", index 2
Whether the input stream is a back edge. By default, MediaPipe requires graphs to be acyclic and treats cycles in a graph as errors. To allow MediaPipe to accept a cyclic graph, set the back_edge fields of the input streams that are back edges to true. A cyclic graph usually has an obvious forward direction, and a back edge goes in the opposite direction. For a formal definition of a back edge, please see https://en.wikipedia.org/wiki/Depth-first_search.
Colors for drawing the label(s).
Thickness for drawing the label(s).
The font height in absolute pixels.
The offset of the starting text in horizontal direction in absolute pixels.
The offset of the starting text in vertical direction in absolute pixels.
The maximum number of labels to display.
Specifies the font for the text. Font must be one of the following from OpenCV: cv::FONT_HERSHEY_SIMPLEX (0) cv::FONT_HERSHEY_PLAIN (1) cv::FONT_HERSHEY_DUPLEX (2) cv::FONT_HERSHEY_COMPLEX (3) cv::FONT_HERSHEY_TRIPLEX (4) cv::FONT_HERSHEY_COMPLEX_SMALL (5) cv::FONT_HERSHEY_SCRIPT_SIMPLEX (6) cv::FONT_HERSHEY_SCRIPT_COMPLEX (7)
Uses Classification.display_name field instead of Classification.label.
Label location.
Used in:
A landmark that can have 1 to 3 dimensions. Use x for 1D points, (x, y) for 2D points and (x, y, z) for 3D points. For more dimensions, consider using matrix_data.proto.
Used in:
Landmark visibility. Should stay unset if not supported. Float score of whether landmark is visible or occluded by other objects. Landmark considered as invisible also if it is not present on the screen (out of scene bounds). Depending on the model, visibility value is either a sigmoid or an argument of sigmoid.
Landmark presence. Should stay unset if not supported. Float score of whether landmark is present on the scene (located within scene bounds). Depending on the model, presence value is either a result of sigmoid or an argument of sigmoid function to get landmark presence probability.
Group of Landmark protos.
Ignore the rotation field of rect proto for projection.
Default behaviour and fast way to disable smoothing.
Used in:
(message has no fields)
For the details of the filter implementation and the procedure of its configuration please check http://cristal.univ-lille.fr/~casiez/1euro/
Used in:
Frequency of incomming frames defined in frames per seconds. Used only if can't be calculated from provided events (e.g. on the very first frame).
Minimum cutoff frequency. Start by tuning this parameter while keeping `beta = 0` to reduce jittering to the desired level. 1Hz (the default value) is a good starting point.
Cutoff slope. After `min_cutoff` is configured, start increasing `beta` value to reduce the lag introduced by the `min_cutoff`. Find the desired balance between jittering and lag.
Cutoff frequency for derivate. It is set to 1Hz in the original algorithm, but can be tuned to further smooth the speed (i.e. derivate) on the object.
If calculated object scale is less than given value smoothing will be disabled and landmarks will be returned as is.
Disable value scaling based on object size and use `1.0` instead. If not disabled, value scale is calculated as inverse value of object size. Object size is calculated as maximum side of rectangular bounding box of the object in XY plane.
Used in:
Number of value changes to keep over time. Higher value adds to lag and to stability.
Scale to apply to the velocity calculated over the given window. With higher velocity `low pass filter` weights new values higher. Lower value adds to lag and to stability.
If calculated object scale is less than given value smoothing will be disabled and landmarks will be returned as is.
Disable value scaling based on object size and use `1.0` instead. If not disabled, value scale is calculated as inverse value of object size. Object size is calculated as maximum side of rectangular bounding box of the object in XY plane.
A subset of indices to be included when creating the detection.
Number of dimensions to convert. Must within [1, 3].
Specifies the landmarks to be connected in the drawing. For example, the landmark_connections value of [0, 1, 1, 2] specifies two connections: one that connects landmarks with index 0 and 1, and another that connects landmarks with index 1 and 2.
Color of the landmarks.
Color of the connections.
Thickness of the drawing of landmarks and connections.
Change color and size of rendered landmarks based on its z value.
Use landmarks visibility while rendering landmarks and connections. If landmark is not visible, neither it nor adjacent connections will be rendered.
Threshold to determine visibility of the landmark. Landmark with visibility greater or equal than threshold is considered visible.
Use landmarks presence while rendering landmarks and connections. If landmark is not present, neither it nor adjacent connections will be rendered.
Threshold to determine presence of the landmark. Landmark with presence greater or equal than threshold is considered present.
Min thickness of the drawing for landmark circle.
Max thickness of the drawing for landmark circle.
Gradient color for the lines connecting landmarks at the minimum depth.
Gradient color for the lines connecting landmarks at the maximum depth.
Linear similarity model: [a -b; * x + [dx; b a] dy]
Used in: ,
By default, set the file open mode to 'rb'. Otherwise, set the mode to 'r'.
Used in:
A mask of size equivalent to the image size. It encodes a region, which can be thought of as a foreground object mask.
Used in:
Dimensions of the mask.
A rasterization-like format for storing the mask.
A bounding box in pixel units. The box is defined by its upper left corner (xmin, ymin) and its width and height.
Used in:
The supported formats for representing location data. A single location must store its data in exactly one way.
Used in:
The full image. This is a handy format when one needs to refer to the full image, e.g. one uses global image labels. No other fields need to be populated.
A rectangle aka bounding box of an object. The field bounding_box must be used to store the location data.
A rectangle aka bounding box of an object, defined in coordinates normalized by the image dimensions. The field relative_bounding_box must be used to store the location data.
A foreground mask. The field mask must be used to store the location data.
A bounding box. The box is defined by its upper left corner (xmin, ymin) and its width and height, all in coordinates normalized by the image dimensions.
Used in:
A keypoint. The keypoint is defined by the coordinates (x, y), normalized by the image dimensions.
Used in:
A way to identify a part of an image. A locus does not need to correspond to a subset of pixels -- e.g. for a local descriptor we might define a locus in terms of its location and scale, even if the support of the descriptor is the entire image (with location-dependent weighting).
A unique identifier for the locus. It is meaningless to compare the locus_ids in different images. The client should not also assume that applying the same processing to the same image multiple times will produce the same locus_id.
"Concatenatable" loci have the property that they appear in the same number and order for all images, so their corresponding features can be concatenated. Examples of concatenatable loci include global loci, those corresponding to fixed bounding boxes, or a single most salient region. Loci produced by segmentation with a variable number of segments, on the other hand, are not concatenatable. This flag is true by default.
Required if locus_type = BOUNDING_BOX, Specifies a bounding box for the label
Specifies a timestamp if this locus appears in a video. timestamp is specified in mSec from start of the video and refers to the begining of the locus.
Required if locus_type = REGION, Specifies a region using a scanline encoding
Required if locus_type = VIDEO_TUBE. Specifies the component loci of the tube.
Types of image loci on the granularity of the annotation.
Used in:
The whole image, without localization.
The locus refers to a specified bounding box. Requires bounding_box below.
The locus refers to specified regions in the image. Requires region below.
This locus refers to groups of loci. Requires component_locus below.
Whether to negate the result.
Optional bool input values.
The logical operation to apply.
Used in:
Selects which channel of the MASK input to use for masking.
Used in:
Proto for serializing Matrix data. Data are stored in column-major order by default.
Order in which the data are stored. Defaults to COLUMN_MAJOR, which matches the default for mediapipe::Matrix and Eigen::Matrix*.
Used in:
Options used by a MediaPipe object.
Used in: , , , ,
(message has no fields)
Used in:
Total number of frequency bands to use.
Lower edge of lowest triangular Mel band.
Upper edge of highest triangular Mel band.
Stores offsets for random seek and time offsets for each frame of TrackingData. Stream offsets are specified relative w.r.t. end of metadata blob. Offsets specify start of the corresponding binary encoded TrackingContainer (for TrackingContainerFormat) or BinaryTrackingData proto (for TrackingContainerProto).
TrackingContainer::header = "META"
Used in:
Used in:
Time offset of the metadata in msec.
Offset of TrackingContainer or
Specification of the underlying mel filterbank.
How many MFCC coefficients to emit.
Used in:
Used in:
Used in:
Specifies which degree of freedom vary across mixture. Can be used to implement several transformation functions quicker.
Used in:
All dof are variable.
Only translation (h_02, h_12) varies.
Translation (h_02, h_12), and skew-rotation
(h_01, h_10) vary.
Mixture is constant.
Mixture models with higher degrees of freedom, according to \sum_i model(i) * weight(i), where weights are passed during transform and are expected to sum to one.
Next tag: 10
If activated when SELECTION input is activated, will replace the computed camera motion (for any of the ANALYSIS_* case above) with the one supplied by the frame selection, in case the frame selection one is more stable. For example, if recomputed camera motion is unstable but the one from the selection result is stable, will use the stable result instead.
Determines number of homography models per frame stored in the CSV file or the homography metadata in META. For values > 1, MixtureHomographies are created.
Used for META_ANALYSIS_HYBRID. Rejects features which flow deviates domain_ratio * image diagonal size from the ground truth metadata motion.
If true, the MotionAnalysisCalculator will skip all processing and emit no packets on any output. This is useful for quickly creating different versions of a MediaPipe graph without changing its structure, assuming that downstream calculators can handle missing input packets. TODO: Remove this hack. See b/36485206 for more details.
Determines how optional input META is used to compute the final camera motion.
Used in:
Uses metadata supplied motions as is.
Seeds visual tracking from metadata motions - estimates visual residual motion and combines with metadata.
Determines how optional input SELECTION (if present) is used to compute the final camera motion.
Used in:
Recompute camera motion for selected frame neighbors.
Use composited camera motion and region flow from SELECTION input. No tracking or re-computation is performed. Note that in this case only CAMERA, FLOW and VIDEO_OUT tags are supported as output.
Recompute camera motion for selected frame neighbors using features supplied by SELECTION input. No feature tracking is performed.
Recomputes camera motion for selected frame neighbors but seeds initial transform with camera motion from SELECTION input.
Settings for MotionAnalysis. This class computes sparse, locally consistent flow (referred to as region flow), camera motions, and foreground saliency (i.e. likely foreground objects moving different from the background). Next tag: 16
Used in:
Options for the actual motion stabilization (in order of object usage).
Clip-size used for (parallelized) motion estimation.
If set, camera motion is subtracted from features before output. Effectively outputs, residual motion w.r.t. background.
If flow_options().tracking_options().tracking_policy() equals POLICY_MULTI_FRAME, this flag indicates which RegionFlowFeatureList to use. Specifically, for frame C, we use the motion from C to C - 1 - track_index.
If set, compute motion saliency (regions of moving foreground).
Selects saliency inliers (only saliency locations with sufficient spatial and temporal support are kept). Only applied when compute_motion_saliency is set.
Performs spatio-temporal filtering of extracted foreground saliency. If used with above selection of saliency inliers, filtering is performed *after* inlier selection. Only applied when compute_motion_saliency is set.
If set, irls weights of motion estimation are spatio-temporally smoothed after model estimation.
If a rejection_transform is passed to AddFrameGeneric, features that do not agree with the transform within below threshold are removed.
Pre-configured policies for MotionAnalysis. For general use, it is recommended to select an appropiate policy instead of customizing flow and motion options by hand. Policies are being kept up to date with appropiate settings.
Used in:
Default legacy options. Effectivley no op.
Use for video.
Use for video on mobile.
Use if applied to camera stream on mobile, e.g. low latency and high throughput. ASSUMES DOWNSAMPLED INPUT, e.g. from GPU.
Use for sped up video / hyperlapse when adding frames with seeds and rejection transforms. Mostly ups temporal consistency weights and relaxes stability constraints. Only recommended to be used as second pass after initial MotionAnalysis and FrameSelection.
Describes how to compute foreground from features.
Used in:
Indicates the *inverse* registration error (i.e. the irls weight) that is deemed a complete inlier. Weights in the interval [0, foreground_threshold] (corresponding to pixel errors in the interval [1 / foreground_threshold, inf]) are mappend to 1 - [0, 1], i.e. foreground threshold is mapped to zero with weights below the threshold being assigned values > 0. Therefore, larger values will increase amount of detected foreground as well as noise.
By using foreground_gamma < 1.0 you can increase resolution of small foreground motion at the expense of the resolution of large foreground motions.
Threshold is scaled by coverage, i.e. for frames with large registration error less forground is visualized.
Adapts visualization for rendered_results when passed to GetResults.
Used in:
Visualizes tracked region flow features, colored w.r.t. fitting error.
Visualizes salient points. Only applicable is compute_motion_saliency is set to true.
Line thickness of ellipse when rendering salient points.
Instead of green burn in uses jet coloring to indicate magnitude of foreground motion.
If set, only keeps masks of pixels that is used for blur analysis, rest is set to zero.
Only long feature tracks with specified minimum length are rendered. Set to zero to consider all tracks.
Only the last N points of a long feature track are rendered. Set to zero to render all points.
Captures additional internal state info about the tracking.
Used in:
Stores all motion vectors that were used for tracking as packed arrays, capturing position, object motion, camera motion, tracking id and corresponding inlier weight.
Within [0, 1]. 0 = outlier; 1 = inlier.
Next tag: 38
Position (top-left corner) and fixed size of the current MotionBox, specified w.r.t. normalized domain (in [0, 1] along both dimensions).
Optional degrees of freedom; scale and rotation w.r.t. center of the box, i.e. [pos_x, pos_y] + 0.5 * [width, height]. To activate see TrackStepOptions::TrackingDegrees.
in radians.
This field is only used when we try to track under TRACKING_DEGREE_OBJECT_PERSPECTIVE.
Aspect ratio (width / height) for the tracked rectangle in physical space.
Whether we want this box to be potentially grouped with other boxes to track together. This is useful for tracking small boxes that lie on a plane. For example, when we detect a plane, track the plane, then all boxes within the plane can share the same homography transform.
For quad tracking using pnp solver, Whether we use perspective-n-points to track quad between frames. That mode requires: 1. The quad which is being tracked is an rectangle in the physical world. 2. The `asepct_ratio` field has to be set in MotionBoxState.
Object velocity in x and y, specified as normalized spatial unit per standard frame period (here calibrated w.r.t. kTrackingDefaultFps = 30 FPS), that is 33.3 ms. Object velocity refers to velocity after subtracting camera motion. If current frame period is 66.67 ms (i.e. 15 fps); actual velocity is obtained by multipling with a factor of 2. Similar for 60 fps factor is 0.5f. Standard frame period is chosen for legacy reasons to keep TrackStepOptions defaults.
Weighted average of object velocity magnitude of inlier points (expressed in normalized spatial units per standard frame period).
Specifies how valid the prior was in the last step.
Spatial prior (presence of inliers, i.e. where is the object located within the box that is currently being tracked) as a pair of a) prior (in [0, 1]) and b) confidence (number of features converted to score within [0, 1]). Prior is defined over a grid of size spatial_prior_grid_size x spatial_prior_grid_size.
Difference score between previous prior and current prior (in [0, 1]). Currently not used.
Score determining how much predicted motion disagrees with measured motion. If measured motion deviates strongly from predicted motion, disparity is +/-1, if motion agrees with predicted motion, disparity is 0. Sign indicates measured motion is accelerating (> 0) or de-accelerating (< 0) w.r.t. predicted motion.
Score determining how discriminative estimated motion model is. In [0, 1] where 0 no discrimination w.r.t. background and 1 high discrimination.
Center of mass for inliers after tracking (center of feature that were used for motion estimation)
Approximate number of inliers (each features scores a zero [outlier] or one [inlier]).
Ratio of above inlier_sum to average inlier_sum across last states.
Extent (width and height of inliers).
Set of current inlier tracking ids.
Corresponding x,y coordinates for each inlier.
Corresponding inlier score (currently: length of inlier observed).
Set of outlier ids.
Corresponding x,y coordinates for each outlier.
Confidence of box tracked in the range [0, 1], with 0 being least confident, and 1 being most confident. A reasonable threshold is 0.5 to filter out unconfident boxes.
Additional internal state.
Used in: ,
Vertex 0 is according to x_0 = vertices(0), y_0 = vertices(1) Vertex 1 is according to x_1 = vertices(2), y_1 = vertices(3) Vertex 2 is according to x_2 = vertices(4), y_2 = vertices(5) Vertex 3 is according to x_3 = vertices(6), y_3 = vertices(7) Order of vertices should be aligned in counter-clockwise manner 0---------3 | | | | 1---------2
Tracking status indicating result of tracking: UNTRACKED: Box can not be tracked (either out of bound or too many tracking failures). EMPTY: Box has size of <= 0 along at least on of its dimensions (collapsed). NO_FEATURES: No features found within the box, tracking is not possible. TRACKED: Successful tracking. DUPLICATED: Successful tracked, but duplicated from previous result as frame was duplicated. BOX_TRACKED_OUT_OF_BOUND: Successful tracked, out of bound from screen area. Will advance by camera motion. Only used for static objects.
Used in:
Note: In general for Estimation modes, the prefix are used as follows: L2: minimize squared norm of error IRLS: iterative reweighted least square, L2 minimization using multiple iterations, downweighting outliers. Next tag: 69
Used in:
Specifies which camera models should be estimated, translation is always estimated.
By default, homography estimation minimizes an objective that is not strictly the L2 distance between matched points. If the flag is set, each row of the linear system is scaled with the exact denominator which results in an objective that minimizes the L2 distance.
Per default, we use exact solver for over-determined system using well-conditioned QR decomposition. For better speed, set value to false to use estimation via normal equations.
If set uses double instead of float when computing normal equations.
Regularizer for perspective part of the homography. If zero, no regularization is performed. Should be >= 0.
If row-wise mixture models are estimated, determines number of them. Note, changing number of mixtures, interpolation sigma and regularizer is very likely to impact the stability analysis for mixtures and rolling shutter scoring. At least MixtureHomographyBounds would need to be adjusted to the new values.
If row-wise mixture models are estimated, determines how much each point is influenced by its neigbhoring mixtures. Specified as relative sigma (standard deviation) w.r.t. frame_height.
Mixture estimation uses L2 regularizer to assure that adjacent mixture models are similar.
Mixtures are estimated across a spectrum of exponentially increasingly regularizers. In particular the regularizer at level L is given as mixture_regularizer * mixture_regularizer_base^L. A maximum of 10 levels are supported (checked!). Note: When changing the number of levels you probably want to adapt the MotionStabilizationOptions::rolling_shutter_increment value as well, as the number of levels directly controls the highest threshold for the rolling shutter index analysis.
IRLS rounds to down-weight outliers (default across all models). Note: IRLS in combination with full mixture models (as opposed to the default reduced ones) is somewhat expensive.
If set to > 0 (always needs be less than 1.0), influence of supplied prior irls weights is linearlly decreased from the specified prior scale (weight 1.0) to prior_scale. Effectively, biases the solution to the supplied prior features. Note: Without irls_weights_preinitialized set to true, this option is effectively a no op. TODO: Retire this option.
Determine how to normalize irls weights w.r.t. average motion magnitude. In general a residual of 1 pixel is assigned an IRLS weight of 1. However as larger motions in general are affected by a larger error, we normalize irls weights, such that a residual of distance of irls_motion_magnitude_fraction times <average translation magnitude> equals an IRLS weight of 1. Must be larger than zero.
Scale that is applied for mixture (where error is expected to be bigger).
By default, irls weight of all features are set uniformly to one before estimating EACH model, refining them in subsequent irls iterations. If flag below is set, input irls weights are used instead for each motion model.
If weights are pre-initialized optionally min filter weights along track ids when long tracks are used. This can be used to consistently label outliers in time before estimation.
Normalizes feature's irls weights prior to estimation such that feature in high density areas are downweighted. Multiplicative in case irls_weights_preinitialized is set to true.
A regular grid of size feature_mask_size x feature_mask_size is used to normalize features w.r.t. their density.
If specified, only features that agree with the estimated linear similarity will be used to estimate the homography. If set, linear_similarity_estimation can not be ESTIMATION_NONE! (checked)
Max. deviation to be considered an inlier w.r.t. estimated similarity for above flag. This value is set w.r.t. normalized frame diameter. TODO: Should take GetIRLSResidualScale into account.
Scale for stricter coverage evaluation. Used for rolling shutter guess computation, by only using high quality inliers. Larger values reflect stricter coverage. Specifically, when computing coverage via GridCoverage call, frac_inlier_threshold is reduced (divided) by specified scale below.
By default frames with zero trackable features (e.g. at the beginning, empty frame or shot boundary) are set identity model but still labeled as valid. If set to false, these frames are flagged as invalid, which can be useful to locate shot boundaries, etc.
Setting for temporal smoothing of irls weights in optional post-processing step. In normalized coordinates w.r.t. frame domain.
Frame diameter across which smoothing is performed.
in frames.
Bilateral weight (for un-normalized color domain [0, .. 255]).
If set to false 3 taps are used.
If set, during temporal smoothing, each frame is weighted by its confidence, defined as the square coverage (or square mean mixture coverage). Therefore, low confidence fits do not errornouesly propagate over time. In addition, if the confidence is below the specified confidence_threshold (relative the the maximum coverage observed in the test interval), irls weights are reset to 1, i.e. biased to be agree with the (unkown) background motion.
Calls TextureFilteredRegionFlowFeatureIRLSWeights on computed irls weights before smoothing them.
Attempts to detect overlays, i.e. static elements burned-into the video that potentially corrupt motion estimation.
Overlay detection is performed over specified number of frames.
By default, irls weights of each feature are overwritten with refined irls weights of the last iteration for the highest degree of freedom model that was estimated stable. If set to false, original irls weights are retained. Note: If overlay detection is activated, features to be deemed overlays have their irls weight set to zero, regardless of this setting. Similarily, an IRLSWeightFilter is applied if requested, regardless of this setting.
IRLS weights for homography estimation are initialized based on the specified options. If, options irls_weights_preinitialized is set, weights are multiplied instead of reset.
If set to false use L1 norm irls weights instead of L0 norm irls weights.
IRLS weights are determined in a limited domain (in particular helpful for stabilization analysis on HD videos). TODO: Make this the default.
For comparison and debugging purposes. Simply estimates requested models without checking their stability via the stable_*_bounds parameters. However, invertibility is still checked to avoid invalid data being passed to later stages of the stabilizer.
Projects higher order motions if estimated correctly down to lower order motions, therefore replacing the previously estimated motions.
DEPRECATED functionality. Use static functions as indicated instead. Non-linear similarity, use MotionEstimation::EstimateSimilarityModelL2.
Used in:
Controls how multiple models via EstimateMotionsParallel are estimated.
Used in:
Models are estimated independently across
frames in parallel.
Previous frame's estimation biases
current one, controlled via above IrlsMaskOptions.
Frame's estimation is biased along
long features, controlled via above LongFeatureBiasOptions.
Estimation is performed jointly over
If any parameter of the estimated homography exceeds these bounds, we deem it UNSTABLE_SIM and use estimated similarity instead.
Used in:
1 / 0.8.
15 degrees.
Inlier coverage is only tested for if average homography error exceeds registration_thresholds. Max of the following two thresholds is used. Absolute in pixels.
Scaled by frame diameter.
Minimum fraction of inlier features w.r.t. frame area.
Grid coverage inlier threshold. Pixel errors below this threshold are considered inliers. Defined w.r.t. frame diameter, approx. 1.5 for 16:9 SD video (480p), i.e. threshold is multiplied by frame diameter.
Used in:
Weight initialization for homography estimation. This is to bias homography estimation either to foreground or background.
Used in:
Constant, treat all features equally.
Weight features in the center higher.
Tends to lock onto foreground.
Weight features around the
Filters irls weights before smoothing them according to specified operation.
Used in:
Irls initialization can be performed in a temporal depdent manner, (if estimation_policy() == TEMPORALLY_DEPENDENT), where the previous frame's motion estimation biases the IrlsInitialization of the currently processed frame. In particular the location and magnitude of inliers is used during the RANSAC selection stage, to favor those features that agree with the prior, represented as confidence mask of inliers (using same dimension as above feature_mask_size). After estimation, the prior is updated.
Used in:
Amount prior is decayed after each iteration.
Score that each inlier adds to the current prior. Specified w.r.t. total number of features, i.e. each feature increases a bins score by inlier_score.
Each inlier scores at least this value regardless of the inlier mask (additive).
Motions are scored relative to previous motion. Threshold denotes absolute minimum of denominator.
Translation is updated in every step by blending it with the previous estimated translation. (alpha is within 0 to 1, where 0 indicates to use only measured translation, i.e. no blending).
Every time translation is updated, prior (in [0, 1]) is increased by the specified amount.
If activated, irls weight of outlier features are reset. Outliers are defined as those features, for which the best model fit after #rounds iterations of RANSAC did NOT yield an error lower than cutoff. Only applies to translation and similarity estimation.
Used in:
Describes how long feature tracks are leveraged for joint estimation across many frames.
Used in:
For each frame-pair motion model, describing the motion between frame I and I - 1, estimate in addition several additional motion models along long feature tracks describing the motion between frame I and I - k * motion_stride (additional models are not output, but help to filter irls weights). Specifies total number of estimated motion models per frame-pair. Must be greater than zero.
Spacing in frames for additional motion models.
If set, performs temporal smoothing across frames of the obtained irls weights.
Used in:
L2 estimation
good performance, robust to outliers.
DEPRECATED modes.
DEPRECATED, use IRLS instead.
DEPRECATED, use IRLS instead, or static
Options being used to bias IRLS features if estimation mode TEMPORAL_LONG_FEATURE_BIAS is being used. Next Tag: 15
Used in:
Estimation is performed multiple times, alternating between model estimation and smooth temporal feature biasing for the specified number of rounds.
Controls how fast the bias for a track gets updated, in case feature is an inlier. Use higher values for less decay of background motion over time.
Same as above for outliers (or features with low prior), i.e those that got recently seeded.
Number of elements after which we deem estimation to be stable. Used to control weight of bias if fewer than the specified number have been observed. Also used as maximum ring buffer size (only most recent number of observations are kept). Must be > 0.
Change in irls weight magnitude (from outlier to inlier) above which we reset the current bias.
Irls weight above which we consider it to be an inlier for bias update purposes (see above inlier and outlier bias). By default, outliers are allowed to update their bias faster than inliers. Must be > 0.
Standard deviation used during feature initialization. Current bias of a track is used to pre-weight features via gaussian weighting with specified standard deviation.
When seeding new tracks (on the first frame), we bilaterally pool neighboring feature biases as seed. Details are controlled by options below. If false, the feature's estimation error is used instead (faster, but less spatially smooth). If activated it is advised to use a patch descriptor radius of at least 20 pixels.
Newly observered tracks's biases are seeded by similar looking features in close spatial proximity. For efficieny a grid is used to determine proximity. Grid size in normalized coordinates w.r.t. frame domain.
Sigma's for combining feature biases.
Defines what we consider to be a long track. Features spawned around locations of similar looking long tracks are considered to have high prior, e.g. their initilization is given more weight.
Determines with fraction of long tracks is considered to be sufficient for highly confident bias seed.
If activated, uses the irls weights from the estimation of the lower degree of freedom model to seed the bias of the higher degree of freedom model. This improves rigidity of the computed motion.
In addition to above outlier and density initialization, long features that are present for a specified ratio of the analysis interval can be upweighted. This greatly improves temporal consistency.
Used in:
Tracks with a length greater of equal to the specified percentile are upweighted by the specified upweight_multiplier.
Features passing above test have their irls weight increased by the specified multiplier prior to estimation.
If any parameter of the estimated homography mixture exceeds these bounds, we deem it UNSTABLE_HOMOG and use the estimated homography instead.
Used in:
Minimum fraction of inlier features w.r.t. block area.
Each block is tested to be stable, regarding the outliers. A frame is labeled unstable, if more or equal than the specified adjacent blocks are labeled outliers.
Maximum number of adjacent empty blocks (no inliers).
Grid coverage threshold inlier threshold. See identical parameter in HomographyBounds.
Note: Mixture models have high DOF are much more affected by outliers than models above. It is recommended that if IRLS estimation is NOT used, that mixture_regularizer is increased by a factor >=3.
Used in:
robust to outliers.
Degree of freedom of estimated homography mixtures. If desired, specific parts of the homography can be held constant across the mixture. For fast draft TRANSLATION_MIXTURE is recommended, for high quality SKEW_ROTATION_MIXTURE.
Used in:
8 dof * num_mixtures
6 dof + 2 dof * num_mixtures
4 dof + 4 dof * num_mixtures
Used in:
Potential overlay features are aggregated over a mask with cells mask_size x mask_size as specified below.
A feature is a strict overlay feature if its motion is less than near_zero_motion and AND less than max_translation_ratio times the estimated translation magnitude at that frame AND is texturedness is sufficiently high.
Minimum texturedness of a feature to be considered an overlay. Motivation: Overlays are mostly text or graphics, i.e. have visually distinguished features.
A feature is a loose overlay feature if its motion is less than loose_near_zero_motion.
Minimum fraction of strict overlay features within a cell to be considered an overlay cell.
Absolute minimum number of strict overlay features within a cell to be considered an overlay cel..
Shot boundaries are introduced in 3 different scenarios: a) Frame has zero tracked features w.r.t. previous frame b) Estimated motion is deemed invalid (CameraMotion::INVALID). c) Visual consistency is above threshold of two adjacent frames.
Used in:
After cases a & b are determined from features/camera motion, they are verified by ensuring visual consistency is above specified threshold, if visual consistency has been computed. Only if this is case will the frame be labeled as shot boundary. Motivation is, that there should always be some (even small) measurable increase in the frame difference at a shot boundary. Verification is only performed if visual_consistency has been evaluated (value >= 0).
Threshold for case c). Sometimes, motion estimation will miss shot boundaries. We define shot boundaries for which the visual consistency is higher than the specified threshold for at least two adjacent frames.
If any test/bound is violated, the motion is deemed UNSTABLE.
Used in:
Input frame has to be labeled stable, i.e. enough features and coverage present.
Minimum number of inlier features (absolute and as fraction of total number of features). TODO: Dataset run setting this to 0.15
Bounds on valid similarities. We use larger values compared to homographies. Note: Bounds are necessary, to guarantee invertability of the resulting similarity.
1 / 0.8.
15 degrees.
Thresholds for a feature to be considered inlier w.r.t similarity transform, expressed in terms of pixel residual error. Max of absolute and fractional thresholds is used. Ratio of inliers that pass regular and strict thresholds are storred in CameraMotion. TODO: Just use lin_sim_inlier_threshold directly, however that recomputes the error, and requires regression testing. Using an extra fractional inlier threshold for now. Absolute in pixels.
Scaled by frame diameter.
TODO: Revisit after frame selection change. Absolute in pixels.
If any parameter of the input flow or estimated translation exceeds these thresholds we deem the motion INVALID.
Used in:
Absolute minimum of features present.
Max magnitude of the translation expressed w.r.t. frame diameter
Motion magnitude is only tested for if standard deviation of estimated translation exceeds threshold.
Max standard deviation of the estimated translation (normalized to frame diameter).
Maximum acceleration between frames. Specified relative to minimum velocity across two adjacent frames (absolute minimum of 0.001 is enforced, ~1 pix for 480p). If exceeded for one frame, the whole batch passed to EstimateMotionsParallel is labeled unstable.
Next tag: 17
Used in:
Standard normalized bounds and weights used to initialize salient points. See region_flow.proto for details.
If set, scales saliency_weight by flow magnitude.
Minimum number of features within a region to be considered salient. Only applicable for functions accepting RegionFlowFrames.
If set, only considers regions flagged as forground.
Specifies roughly number of foreground features mapped to one mode, for mode to be considered salient.
Only returns the top N irls modes.
Mode finding is performed with a fraction radius of 10% of frame diameter by default.
We filter salient points along the temporal dimension only, keeping those that have sufficient support (in form of neighboring salient points). For every salient point in frame n, all points in frames [n - filtering_frame_radius, n + filtering_frame_radius] are tested, whether they support the current test point.
Fractional distance to be considered a supporting salient point for a test point.
Minimum number of supporting salient points that need to be present in order for a point to be considered an inlier.
Sigma in space (normalized domain).
Sigma in time (in frames).
Header for a multi-stream time series. Each packet in the associated stream is a vector<Matrix> of size num_streams. Each Matrix in the vector is as specified by the time_series_header field.
A proto2 calculator options for testing.
The output frame rate measured in frames per second.
Whether and what kind of header to place on the output stream.
Adds jitter to resampling if set, so that Google's sampling is not
If specified, output timestamps are aligned with base_timestamp.
If set, the output timestamps nearest to start_time and end_time
Format string used by string::Substitute to construct the output.
Used in:
Options for NodeChainSubgraph.
The type of the node. The node must have exactly one input stream and exactly one output stream.
How many copies of the node should be chained in series.
Options to NonMaxSuppression calculator, which performs non-maximum suppression on a set of detections.
Number of input streams. Each input stream should contain a vector of detections.
Maximum number of detections to be returned. If -1, then all detections are returned.
Minimum score of detections to be returned.
Jaccard similarity threshold for suppression -- a detection would suppress all other detections whose scores are lower and overlap by at least the specified threshold.
Whether to put empty detection vector in output stream.
Algorithms that can be used to apply non-maximum suppression.
Used in:
Only supports relative bounding box for weighted NMS.
During the overlap computation, which is used to determine whether a rectangle suppresses another rectangle, one can use the Jaccard similarity, defined as the ration of the intersection over union of the two rectangles. Alternatively a modified version of Jaccard can be used, where the normalization is done by the area of the rectangle being checked for suppression.
Used in:
A normalized version of above Landmark proto. All coordinates should be within [0, 1].
Used in:
Group of NormalizedLandmark protos.
A rectangle with rotation in normalized coordinates. The values of box center location and size are within [0, 1].
Location of the center of the rectangle in image coordinates. The (0.0, 0.0) point is at the (top, left) corner.
Size of the rectangle.
Rotation angle is clockwise in radians.
Optional unique id to help associate different NormalizedRects to each other.
If set, we will attempt to automatically apply the orientation specified by the image's EXIF data when loading the image. Otherwise, the image data will be loaded as-is.
Quality of the encoding. An integer between (0, 100].
TODO: Consider renaming it to EncodedImage.
Pixel data encoded as JPEG.
Height of the image data under #1 once decoded.
Width of the image data under #1 once decoded.
Color space used.
Used in:
The 4-character code of the codec to encode the video.
The video format of the output video file.
The frame rate in Hz at which the video frames are output.
Dimensions of the video in pixels.
Stores the two channels of the flow field in raster order.
Settings specifying an output stream handler.
Used in: ,
Name of the registered output stream handler class.
Names of the input side packets for the handler specifically and distinct from the side packets for the calculator (but could be shared).
Options for the output stream handler.
When true, this calculator will drop received TICK packets if any input stream hasn't received a packet yet.
A PacketFactory creates a side packet.
Used in: ,
The name of the registered packet factory class.
The name of the output side packet that this packet factory creates.
DEPRECATED: The old name for output_side_packet.
The options for the packet factory.
Options used by a PacketFactory to create the Packet.
Used in:
(message has no fields)
Contains the packet frequency information.
Packet frequency (packets per second).
A label that identifies what this packet frequency is for. Eg. "Gaze", "Gesture", etc.
Options for PacketFrequencyCalculator.
Time window (in seconds) over which the packet frequency is computed. Must be greater than 0 and less than 100 seconds (in order to limit memory usage).
Text identifiers for the input streams.
The settings specifying a packet generator and how it is connected.
Used in:
The name of the registered packet generator class.
The names of the input side packets. The PacketGenerator can choose to access its input side packets either by index or by tag.
DEPRECATED(mgeorg) The old name for input_side_packet.
The names of the output side packets that this generator produces. The PacketGenerator can choose to access its output side packets either by index or by tag.
DEPRECATED(mgeorg) The old name for output_side_packet.
The options for the packet generator.
Options used by a PacketGenerator.
Used in:
(message has no fields)
Contains the latency information for a packet stream in mediapipe. The following are provided 1. current latency 2. running average 3. histogram of latencies observed 4. cumulative sum of latencies observed NextId: 13
Current latency (delay in microseconds wrt a reference packet).
The latency histogram which stores the count recorded for each specified interval.
Number of intervals for the latency histogram output.
Size of the histogram intervals (in microseconds). The first interval is [0, interval_size_usec). The last interval extends to +inf.
Running average of latencies observed so far.
An identifier label for the packet.
Cumulative sum of individual packet latencies of all the packets output so far.
Number of intervals for the latency histogram output.
Interval size (in microseconds) for the histogram.
Reset time (in microseconds) for histogram and average. The histogram and running average are initialized to zero periodically based on the specified duration. Negative value implies never resetting the statistics.
Identifier labels for each input packet stream. The order of labels must correspond 1:1 with the input streams order. The labels are copied to the latency information output by the calculator.
The configuration for a PacketManager.
The output frame rate measured in frames per second. The closest packet in time in each period will be chosen. If there is no packet in the period then the most recent packet will be chosen (not the closest in time).
Whether and what kind of header to place on the output stream. Note, this is about the actual header, not the VIDEO_HEADER stream. If this option is set to UPDATE_VIDEO_HEADER then the header will also be parsed (updated) and passed along to the VIDEO_HEADER stream.
Flush last packet even if its timestamp is greater than the final stream timestamp.
Adds jitter to resampling if set, so that Google's sampling is not externally deterministic. When set, the randomizer will be initialized with a seed. Then, the first sample is chosen randomly (uniform distribution) among frames that correspond to timestamps [0, 1/frame_rate). Let the chosen frame correspond to timestamp t. The next frame is chosen randomly (uniform distribution) among frames that correspond to [t+(1-jitter)/frame_rate, t+(1+jitter)/frame_rate]. t is updated and the process is repeated. Valid values are in the range of [0.0, 1.0] with the default being 0.0 (no jitter). A typical value would be a value in the range of 0.1-0.25. Note that this does NOT guarantee the desired frame rate, but if the pseudo-random number generator does its job and the number of frames is sufficiently large, the average frame rate will be close to this value.
Enables reflection when applying jitter. This option is ignored when reproducible_sampling is true, in which case reflection will be used. New use cases should use reproducible_sampling = true, as jitter_with_reflection is deprecated and will be removed at some point.
If set, enabled reproducible sampling, allowing frames to be sampled without regards to where the stream starts. See packet_resampler_calculator.h for details. This enables reflection (ignoring jitter_with_reflection setting).
If specified, output timestamps are aligned with base_timestamp. Otherwise, they are aligned with the first input timestamp. In order to ensure that the outptut timestamps are reproducible, with round_limits = false, the bounds for input timestamps must include: [start_time - period / 2, end_time + period / 2], with round_limits = true, the bounds for input timestamps must include: [start_time - period, end_time + period], where period = 1 / frame_rate. For example, in PacketResamplerCalculatorOptions specify "start_time: 3000000", and in MediaDecoderOptions specify "start_time: 2999950".
If specified, only outputs at/after start_time are included.
If specified, only outputs before end_time are included.
If set, the output timestamps nearest to start_time and end_time are included in the output, even if the nearest timestamp is not between start_time and end_time.
Used in:
Do not output a header, even if the input contained one.
Pass the header, if the input contained one.
Update the frame rate in the header, which must be of type VideoHeader.
Tests that the tags used to encode the timestamp do not interfere with proto tags.
The tag below = 1777 | (1 << 28).
The period (in microsecond) specifies the temporal interval during which only a single packet is emitted in the output stream. Has subtly different semantics depending on the thinner type, as follows. Async thinner: this option is a refractory period -- once a packet is emitted, we guarantee that no packets will be emitted for period ticks. Sync thinner: the period specifies a temporal interval during which only one packet is emitted. The emitted packet is guaranteed to be the one closest to the center of the temporal interval (no guarantee on how ties are broken). More specifically, intervals are centered at start_time + i * period (for non-negative integers i). Thus, each interval extends period/2 ticks before and after its center. Additionally, in the sync thinner any packets earlier than start_time are discarded and the thinner calls Close() once timestamp equals or exceeds end_time.
Packets before start_time and at/after end_time are discarded. Additionally, for a sync thinner, start time specifies the center of time invervals as described above and therefore should be set explicitly.
If not specified, set to 0 for SYNC type,
and set to Timestamp::Min() for ASYNC type.
Set to Timestamp::Max() if not specified.
Whether the timestamps of packets emitted by sync thinner should correspond to the center of their corresponding temporal interval. If false, packets emitted using original timestamp (as in async thinner).
If true, update the frame rate in the header, if it's available, to an estimated frame rate due to the sampling.
Used in:
Asynchronous thinner, described below [default].
Synchronous thinner, also described below.
Captures additional information about a RegionFlowFeature's surrounding patch. Using MotionEstimation::RetrieveRegionFlowFeatureList or ComputeRegionFlowFeatureDescriptors the patch descriptor has the folling layout: (9 dimensional: 3 mean intensities, 3x3 covariance matrix, (only store upper half (6 elems) in column major order, i.e. indices for data in patch descriptor refer to: mean: 0 1 2, covariance: 3 4 5 6 7 8
Used in:
The actual feature descriptor.
Several intensity matches computed from equal percentiles of matching patch pairs. No number or particular ordering is assumed.
Configs for the profiler for a calculator. Not applicable to subgraphs.
Used in: ,
Size of the runtimes histogram intervals (in microseconds) to generate the histogram of the Process() time. The last interval extends to +inf. If not specified, the interval is 1000000 usec = 1 sec.
Number of intervals to generate the histogram of the Process() runtime. If not specified, one interval is used.
TODO: clean up after migration to MediaPipeProfiler. DEPRECATED: If true, the profiler also profiles the input output latency. Should be true only if the packet timestamps corresponds to the microseconds wall time from epoch.
If true, the profiler starts profiling when graph is initialized.
If true, the profiler also profiles the stream latency and input-output latency. No-op if enable_profiler is false.
If true, the profiler uses packet timestamp (as production time and source production time) for packets added by calling CalculatorGraph::AddPacketToInputStream(). If false, uses profiler's clock.
The maximum number of trace events buffered in memory. The default value buffers up to 20000 events.
Trace event types that are not logged.
The output directory and base-name prefix for trace log files. Log files are written to: StrCat(trace_log_path, index, ".binarypb")
The number of trace log files retained. The trace log files are named "trace_0.log" through "trace_k.log". The default value specifies 2 output files retained.
The interval in microseconds between trace log output. The default value specifies trace log output once every 0.5 sec.
The interval in microseconds between TimeNow and the highest times included in trace log output. This margin allows time for events to be appended to the TraceBuffer.
Deprecated, replaced by trace_log_instant_events.
The number of trace log intervals per file. The total log duration is: trace_log_interval_usec * trace_log_file_count * trace_log_interval_count. The default value specifies 10 intervals per file.
An option to turn ON/OFF writing trace files to disk. Saving trace files to disk is enabled by default.
If true, tracer timing events are recorded and reported.
False specifies an event for each calculator invocation. True specifies a separate event for each start and finish time.
Sigma for color difference.
Determines how fast confident values can propagate. Filters are normalized, such that confidence dissipates quickly instead of propagating. To ensure confidence propagates the importance weight is scaled by the scalars specified below. Larger values yield quicker propagation.
Above bilateral sigma is scaled at each level by the specified scale (for push and pull phase). This is due to iterative downsampling of the guidance image introduces errors making bilateral weighting increasingly errorneous.
Message storing min value and max value for normalization in all channels.
For all channels.
A Region can be represented in each frame as a set of scanlines (compressed RLE, similar to rasterization of polygons). For each scanline with y-coordinate y, we save (possibly multiple) intervals of occupied pixels represented as a pair [left_x, right_x].
Used in: ,
Intervals are always sorted by y-coordinate. Therefore, a region occupies a set of scanlines ranging from interval(0).y() to interval(interval_size() - 1)).y(). Note: In video, at some scanlines no interval might be present.
Used in:
NOTE: This calculator uses QResampler, despite the name, which supersedes RationalFactorResampler.
target_sample_rate is the sample rate, in Hertz, of the output stream. Required. Must be greater than 0.
Set to false to disable checks for jitter in timestamp values. Useful with live audio input.
Parameters for initializing QResampler. See QResampler for more details.
Used in:
Kernel radius in units of input samples.
Anti-aliasing cutoff frequency in Hertz. A reasonable setting is 0.45 * min(input_sample_rate, output_sample_rate).
The Kaiser beta parameter for the kernel window.
Selects which channel of the MASK input to use for masking.
Color to blend into input image where mask is > 0. The blending is based on the input image luminosity.
Swap the meaning of mask values for foreground/background.
Whether to use the luminance of the input image to further adjust the blending weight, to help preserve image textures.
Used in:
A rectangle with rotation in image coordinates.
Location of the center of the rectangle in image coordinates. The (0, 0) point is at the (top, left) corner.
Size of the rectangle.
Rotation angle is clockwise in radians.
Optional unique id to help associate different Rects to each other.
Whether the rendered rectangle should be filled.
Line color or filled color of the rectangle.
Thickness of the line (applicable when the rectangle is not filled).
Whether the rendered rectangle should be an oval.
Multiplier to apply to the rect size. If one defined `thickness` for RenderData primitives for object (e.g. pose, hand or face) of size `A` then multiplier should be `1/A`. It means that when actual object size on the image will be `B`, than all RenderData primitives will be scaled with factor `B/A`.
Scaling factor along the side of a rotated rect that was aligned with the X and Y axis before rotation respectively.
Additional rotation (counter-clockwise) around the rect center either in radians or in degrees.
Shift along the side of a rotated rect that was aligned with the X and Y axis before rotation respectively. The shift is relative to the length of corresponding side. For example, for a rect with size (0.4, 0.6), with shift_x = 0.5 and shift_y = -0.5 the rect is shifted along the two sides by 0.2 and -0.3 respectively.
Change the final transformed rect into a square that shares the same center and rotation with the rect, and with the side of the square equal to either the long or short side of the rect respectively.
Next tag: 67
Used in:
Features are binned into grids of different resolutions (see fast_estimation_block_size below) and retained if they survive a localized translation based RANSAC algorithm and at the survivors are at least of size min_feature_inliers. Must be at least 3!
Relative number of inlier features w.r.t. average number of features per grid bin. Maximum of both thresholds is used as actual threshold.
Pre-blur before computing features to reduce noise. Set to zero for no blurring.
Number of ransac rounds to estimate per region flow vector. This could be adaptive, but the required number of rounds is so low, that estimating the bound is more costly than just running it for a fixed number of times.
Error thresholds for a feature to be considered as an inlier in pixel-distance. The max of all three thresholds below is used as the actual threshold. Absolute in pixels.
Scaled w.r.t. frame diameter.
Scaled w.r.t model estimated during each RANSAC round.
Returns for each grid only the top N inlier sets.
For debugging purposes, uses all tracked features regardless of the above setting.
Block size in pixels. If fractional block_size is used (0 < size < 1), it is interpreted as fraction of the image dimensions. We use 4 blocks in each dimension by standard.
Minimum block size in pixels (larger dimension) to perform fast estimation on. Pyramid levels are allocated such that block_size * 0.5^(level - 1) = min_block_size. At least two levels are used.
We use overlapping versions of the grid, next parameters specifies how many in each dimensions (total is therefore, the value squared!).
Flow features with motion above this thresholds (w.r.t. frame diameter) are rejected.
Flow features that have a motion that is larger than median_magnitude_bounds times the median magnitude are discarded. If set to zero, test is not enforced.
If this option is activated, feature's irls weight is initialized to the inverse of its computed flow.
Specify the size of either dimension here, the frame will be downsampled to fit downsampling_size.
If set, we will force the computed downsampling factor to be the nearest integer, resulting in faster downsampling. This will have no effect for DOWNSAMPLE_TO_INPUT_SIZE, DOWNSAMPLE_BY_FACTOR, and DOWNSAMPLE_BY_SCHEDULE, which should have exact values defined.
Used if downsample_mode is DOWNSAMPLE_BY_SCHEDULE.
Minimum number of good features that we require to be present. Without good features, the estimated motion models will do more harm than good, so it is better to use simply the identity transform for this frame, and set the flag unstable_models to true in RegionFlow.
We also require features to cover a minimum percentage area of the frame. We use downsampling and plot each feature by a 1 in a grid, this is equivalent to plotting each feature by a rectangle in the original frame.
Grid size for above min feature cover.
Computes blur score for each frame. Score is proportional to amount of blur present in a frame, i.e. higher scores reflect more blurred frames. Note that the score is dependent on the gradient distribution of the image content, i.e. the score itself is rather meaningless but needs to be compared to scores of neighboring frames.
Radius of patch descriptor computed during RetrieveRegionFlowFeatureList call.
Minimum distance from image border. Must be greater or equal to patch_descriptor_radius.
Corner response is scaled by scalar below and normalized to lie within [0, 1], where 0 is low corner score and 1 high corner score.
Verifies reliablity of features, by back-tracking operation from matched location. If returned location is within verification_distance feature is accepted otherwise discarded.
If set, consistency of long features is verified (in case tracking_policy is set to POLICY_LONG_FEATURES) by extracting a patch around the feature during the very first observation and comparing the matching patching along the long feature trajectory via SSD. If the difference is above the long_feature_verification_threshold the feature is removed.
Maximum average per pixel error (in L1 norm) in the normalized intensity domain for matching patches to be considered to be consistent.
Long features are expected to have limited acceleration over time. If acceleration exceeds specified value based on the setting in verify_long_feature_acceleration either: a) verify_long_feature_acceleration = false A new track is started instead of continuing the old one. The track itself is not removed in this case. b) verify_long_feature_acceleration = true The track is flagged for verification, by back-tracking operation from matched location. If track fails verification test it is discarded. This only triggers if at least verify_long_feature_trigger_ratio of features have been flagged, otherwise option a is used.
If true, histogram equalization is performed to the input image sequence before registration.
If true, synthetic region flows with zero motion are used for all (or just the first) frame.
Optional gain correction before tracking features. Improves robustness when lighting is changing.
If set performs gain correction by simply equalizing mean intensity between frames, instead of using ToneEstimation.
If the multiple hypothesis flag is set, features are tracked using both with and without gain correction, and the hypothesis with more inliers is selected.
This flag, when used together with the multiple hypotheses flag, specifies that gain correction should increase the number of inliers by at least this fraction for it to be used instead of default tracking.
If set, always uses the brighter frame as reference. This is the preferred direction of correction, to avoid overexposed regions from being corrected which leads to spurious matches.
Only performs gain correction if number of tracked features falls under specified ratio (w.r.t. previous frame). Set to zero, to always perform gain correction if requested.
Gain correction is based on a grid of zero motion features, independent of the underlying motion. Fractional parameter specifies resolution of the grid w.r.t. frame size.
Bounds for the estimated model. If not set externally, will be set based on GainCorrectMode.
Image format of the input.
The descriptor extractor type used.
Whether to compute derivatives when building the pyramid. When set to true, it's building a Laplacian pyramid. When set to false, it's building a Gaussian pyramid.
Used in:
Blur score is only computed over image regions of high cornerness (as blur in any direction will always alter these regions). First, the corner image (smallest eigenvalue of 2nd moment matrix) is box filtered, and then thresholded.
Specifies relative (w.r.t. maximum) and absolute corneress threshold for threshold operation.
Blur score is defined as 1.0 / <median cornerness>, where <median cornerness> is the n-th percentile of the cornerness evaluated over the image regions of high corness as specified above.
Used in:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.370.4395&rep=rep1&type=pdf
Downsampling schedule. Frame sizes up to which a particular downsampling factor is applied. Factor chosen by comparing actual frame area against standard area (standard_width * standard_height), where standard_width = 16/9 X standard_height.
Used in:
For <= 360p.
For <= 480p.
For <= 720p.
>= 720p.
We support down-sampling of an incoming frame before running the resolution dependent part of the region flow computation (feature extraction and tracking if desired). Note that in all downsampling modes except for DOWNSAMPLE_TO_INPUT_SIZE, for uneven dimensions after downsampling, we always round up to the nearest even dimension, i.e. 350p with a downsample_factor of 2.0 would expect an input of size 176p.
Used in:
No downsampling.
Downsizes the input frame such that frame_size == downsampling_size, where frame_size := max(width, height).
Downsizes frame by pre-defined factor, downsample_factor below.
Downsampling based on downsampling schedule, see DownsampleSchedule below for details.
Downsizes the input frame such that frame_size == downsampling_size, where frame_size := min(width, height).
Input frame is assumed to be already downsampled by the factor specified by downsample_factor below. For example if the original frame is 720p, and downsample_factor is set to 2.0, then we expect as input 360p.
Used in:
Uses default or user supplied bounds,
i.e. gain_bias_bounds is left untouched.
Uses defaults for video (most strict).
Uses most relaxed settings to track
across HDR frames, taken at different exposures.
More relaxed than video but stricter
Supported image formats. All images are converted to grayscale before processing. These image formats only concern AddImage. IMPORTANT: All the Retrieve* methods expect RGB when the descriptors are computed.
Used in:
Determines how irls weights for computed features are initialized. In general, more stable features are given higher weight.
Used in:
All weights equal 1
Feature's irls weight is initialized to a value in [0, 2] indicating how consistent the feature's motion is w.r.t. neighboring features (high values = very consistent). Determined by counting how often a feature is part of the inlier set for a particular bin.
Determines how/if visual consistency is computed. If activated, computes the absolute *change* in visual difference between two adjancent frame pairs, i.e. the modulus of the 2nd derivative of the frame appearance. Stores result in RegionFlowFeatureList::visual_consistency.
Used in:
Computation of visual consistency is only performed if activated.
Incoming color or gray scale image is scaled to a tiny square image of the specified dimension. Used to compare adjacent images via SSD.
Tracked feature at location (x,y) with flow (dx, dy) and patch based error (sum of absolute value of intensity difference). Next tag: 19
Used in: ,
Features that belong to the same feature track are assigned a unique id and are identified via it. Note, this id is only unique within the lifetime of a RegionFlowComputation object. That is, if distribution or parallelization using multiple instances was used, the ids are only unique within that instance context.
no id.
Tracking error as patch intensity residual (SSD).
Inverse of registration error (in pixels), after parametric motion model fitting. Values are in [0, 1e6]. Low values correspond to outliers, high values to inliers. Set by MotionEstimation::EstimateMotions*
Corner response (computed as minimum eigenvalue of block filtered 2nd moment matrix).
Patch feature descriptors. *For internal use only*. External clients should not rely on their contents.
Internal datastructure used temporally during temporal IRLS smoothing.
Optional label for debugging purposes.
Unique feature id per RegionFlowComputation object.
octave (pyramid layer) from which the keypoint has been extracted
Feature descriptor for the current feature.
Flags indicating specific statuses.
Used for long feature tracks if track id
Encapsulates a list of features with associated flow. Can be extracted from RegionFlow via GetRegionFlowFeatureList declared in region_flow.h. This is the essential (additional) information required by Cropper using wobble_suppression with displacements. Next tag: 14
Used in:
Set from corresponding RegionFlowFrame field.
Records the minimum distance from the image border for each feature and matching feature (if enforced > 0).
Set from corresponding RegionFlowFrame field.
If set, indicates, that features represent long tracks, i.e. each feature has a valid track_id() >= 0.
If long_tracks, stores number of long feature tracks that got rejected in this frame, as their patches were deemed inconsistent with the track's very first extracted patch.
Measures visual consistency between adjacent frames. In particular, stores the absolute *change* in visual difference between two adjancent frame pairs, i.e. the modulus of the 2nd derivative of the frame appearance. Normalized w.r.t. number of channels and total pixels of the underlying frame. In particular for sudden changes (e.g. shot boundaries) this value will be significantly non-zero (> 0.05). Negative value per default indicates no consistency has been computed.
Timestamp in micro seconds of the underlying frame, that is the frame for which the source features (not matching features) were computed.
Denotes the frame that flow was computed w.r.t. to, locally to the current frame. For example, if current frame is N, N + match_frame is the matching frame that flow was computed to. Values < 0 indicate backward tracking, while values > 0 indicate forward tracking. By default, for empty feature lists, matching frame is the same as current frame, i.e. match_frame = 0.
Set, if frame is estimated to be an exact duplicate of the previous frame.
Stores all the tracked ids that have been discarded actively in this frame. This information will be popluated via RegionFlowFeatureList, so that the downstreaming modules can receive it and use it to avoid misjudgement on tracking continuity. Discard reason: (1) A tracked feature has too long track, which might create drift. (2) A tracked feature in a highly densed area, which provides little value.
RegionFlowFrame is a optical flow representation where each region has a consistent optical flow (adheres to local translational model). Regions are arranged in a regular grid according to BlockDescriptor. Next tag: 11.
Sorted by id for quick lookup.
Total number of features in all RegionFlow's.
If set, indicates that the frame's region flow is unstable. (not enough features or coverage too low).
Blur score of the current frame is defined as the n-th percentile of the corneress of the input frame evaluated over regions of high corneress. For details see BlurScoreOptions in region_flow_computation.proto. The actual value is pretty meaningless, but relative to the blur score of other frames one can detect blurry frames, e.g. by a 'significant' local maxima in a sequence of blur_scores.
Region flow is estimated using a grid of equal sized bins as regions. BlockDescriptor specifies size of bins/blocks.
Used in:
Next tag: 8
Used in:
Mean anchor point (centroid) of flow vector and mean flow.
Used in:
The RenderAnnotation can be one of the below formats.
Thickness for drawing the annotation.
Color for drawing the annotation. For FilledRectangle and FilledOval, this color is used only for drawing the boundary.
A hint regarding what this annotation is for. Should be unique across all annotation types.
Used in:
The arrow head will be drawn at (x_end, y_end).
Used in:
Color to fill in the oval.
Used in:
Color to fill in the rectangle.
Used in:
Color to fill in the rounded rectangle.
Used in:
Linearly interpolate between color1 and color2 along the line.
Used in:
Used in:
Used in: ,
An oval is specified by the rectangle that encloses the oval. For example, a circle with center at (x,y) and radius r can be specified as a Rectangle with left = x - r, right = y - r, and width = height = 2 * r.
Used in:
Used in: , , ,
Left and top refer to the x and y coordinates of the top-left corner of rectangle, whereas right and bottom refer to the x and y coordinates of the bottom-right corner of rectangle.
Rotation in radians.
Used in: ,
A rounded rectangle is specified by a rectangle and a corner radius to round each corner by. A corner radius of 0 implies a standard non-rounded rectangle (i.e. sharp edges) but as the radius increases proportionally to the width and height of the overall rectangle size, the corners increasingly round.
The radius of the round corners.
Use one of the following: -1: a filled line (FILLED) 4: a 4-connected line (LINE_4) 8: a 8-connected line (LINE_8) 16: an antialiased line (LINE_AA).
Used in: ,
The location to render the text. Left and baseline refer to the x and y coordinates of the start location of text respectively.
The height of the text from top to baseline. When normalized=true, font height is specified wrt the image height.
Specifies the font for the text. Font must be one of the following from OpenCV: cv::FONT_HERSHEY_SIMPLEX (0) cv::FONT_HERSHEY_PLAIN (1) cv::FONT_HERSHEY_DUPLEX (2) cv::FONT_HERSHEY_COMPLEX (3) cv::FONT_HERSHEY_TRIPLEX (4) cv::FONT_HERSHEY_COMPLEX_SMALL (5) cv::FONT_HERSHEY_SCRIPT_SIMPLEX (6) cv::FONT_HERSHEY_SCRIPT_COMPLEX (7)
Options to center text around the anchor point (left, baseline) by taking into account font shape, size and text length (e.g., [left, baseline] represent [center_x, center_y].
A RenderData is a collection of multiple RenderAnnotations. For example, a face can be rendered using a group of annotations: a bounding box around the face (rectangle) and annotations for various face parts such as eyes, nose etc (2D points).
An optional string that uniquely identifies this class of annotations.
An optional viewport to which this set of annotations are intended to be rendered. If left unset, the annotations are meant to render overtop of the existing camera feed in the "main" viewport. If set, the annotations are to be rendered in a separate viewport.
Represents a destination viewport to render annotations into, when specified in RenderData.
Used in:
A unique identifier for this viewport.
The width and height of this viewport in absolute pixels. Normalized coordinates on annotations destined for this viewport as normalized relative to these absolute pixel dimensions. Camera feeds destined for this viewport will be rescaled to match these dimensions. Note: It is not expected that mid-stream resizing should be possible -- the visualizer is epxected to use the first dimensions it sees for a given viewport and ignore any ignore subsequent changes.
Set to true if this viewport should render its annotations overtop of a (rescaled to width/height) copy of the camera feed.
Options to generate anchors for Retina object detection models.
Size of input images.
Min and max scales for generating anchor boxes on feature maps.
The offset for the center of anchors. The value is in the scale of stride. E.g. 0.5 meaning 0.5 * |current_stride| in pixels.
Number of output feature maps to generate the anchors on.
Sizes of output feature maps to create anchors. Either feature_map size or stride should be provided.
Strides of each output feature maps.
List of different aspect ratio to generate anchors.
A boolean to indicate whether the fixed 3 boxes per location is used in the lowest layer.
An additional anchor is added with this aspect ratio and a scale interpolated between the scale for a layer and the scale for the next layer (1.0 for the last layer). This anchor is not included if this value is 0.
Whether use fixed width and height (e.g. both 1.0f) for each anchor. This option can be used when the predicted anchor width and height are in pixels.
Counterclockwise rotation.
(message has no fields)
Used in:
For TYPE_INCLUDE: During retargeting and stabilization salient points introduce constraints that will try to keep the normalized location in the rectangle frame_size - normalized bounds. For this soft constraints are used, therefore the weight specifies how "important" the salient point is (higher is better). In particular for each point p the retargeter introduces two pairs of constraints of the form: x - slack < width - right and x + slack > 0 + left, with slack > 0 where the weight specifies the importance of the slack. For TYPE_EXCLUDE_*: Similar to above, but constraints are introduced to keep the point to the left of the left bound OR the right of the right bound. In particular: x - slack < left OR x + slack >= right Similar to above, the weight specifies the importance of the slack. Note: Choosing a too high weight can lead to jerkiness as the stabilization essentially starts tracking the salient point.
Used in:
Normalized location of the point (within domain [0, 1] x [0, 1].
Salient point type. By default we try to frame the salient point within the bounding box specified by left, bottom, right, top. Alternatively, one can choose to exclude the point. For details, see discussion above.
Bounds are specified in normalized coordinates [0, 1], FROM the specified border. Opposing bounds (e.g. left and right) may not add to values larger than 1. Default bounds center salient point within centering third of the frame.
In addition salient point can represent a region of interest (defined as ellipse of size norm_major x norm_minor (normalized to [0, 1] domain) which orientation is given by angle (in radians in [0, pi]). Due to aspect ratio change of the normalized domain, it is recommended that transformations to other domains are done via the ScaleSalientPoint function.
Angle of major axis with x-axis (counter-clock wise, in radians).
Used in:
Aggregates SalientPoint's for a frame.
Order of operations. 1) Crop the image to fit within min_aspect_ratio and max_aspect_ratio. 2) Scale and convert the image to fit inside target_width x target_height using the specified scaling algorithm. (maintaining the aspect ratio if preserve_aspect_ratio is true). The output width and height will be divisible by 2, by default. It is possible to output width and height that are odd numbers when the output format is SRGB and the aspect ratio is left unpreserved. See scale_to_multiple_of for details.
Target output width and height. The final output's size may vary depending on the other options below. If unset, use the same width or height as the input. If only one is set then determine the other from the aspect ratio (after cropping). The output width and height will be divisible by 2, by default.
If true, the image is scaled up or down proportionally so that it fits inside the box represented by target_width and target_height. Otherwise it is scaled to fit target_width and target_height completely. In any case, the aspect ratio that is preserved is that after cropping to the minimum/maximum aspect ratio. Additionally, if true, the output width and height will be divisible by 2.
If ratio is positive, crop the image to this minimum and maximum aspect ratio (preserving the center of the frame). This is done before scaling. The string must contain "/", so to disable cropping, set both to "0/1". For example, for a min_aspect_ratio of "9/16" and max of "16/9" the following cropping will occur: 1920x1080 (which is 16:9) is not cropped 640x1024 (which is 10:16) is not cropped 640x320 (which is 2:1) cropped to 568x320 (just under 16/9) 96x480 (which is 1:5), cropped to 96x170 (just over 9/16) The resultant frame will always be between (or at) the min_aspect_ratio and max_aspect_ratio.
If unset, use the same format as the input. NOTE: in the current implementation, the output format (either specified in the output_format option or inherited from the input format) must be SRGB. It can be YCBCR420P if the input_format is also the same.
The upscaling algorithm to use. The default is to use CUBIC. Note that downscaling unconditionally uses DDA; see image_processing:: AffineGammaResizer for documentation.
The output image will have this alignment. If set to zero, then any alignment could be used. If set to one, the output image will be stored contiguously.
Set the alignment padding area to deterministic values (as opposed to possibly leaving it as uninitialized memory). The padding is the space between the pixel values in a row and the end of the row (which may be different due to alignment requirements on the length of a row).
Applies sharpening for downscaled images as post-processing. See image_processing::AffineGammaResizer for documentation.
If input_format is YCBCR420P, input packets contain a YUVImage. If input_format is a format other than YCBCR420P or is unset, input packets contain an ImageFrame. NOTE: in the current implementation, the input format (either specified in the input_format option or inferred from the input packets) must be SRGB or YCBCR420P.
If set to 2, the target width and height will be rounded-down to the nearest even number. If set to any positive value other than 2, preserve_aspect_ratio must be false and the target width and height will be rounded-down to multiples of the given value. If set to any value less than 1, it will be treated like 1. NOTE: If set to an odd number, the output format must be SRGB.
If true, assume the input YUV is BT.709 (this is the HDTV standard, so most content is likely using it). If false use the previous assumption of BT.601 (mid-80s standard). Ideally this information should be contained in the input YUV Frame, but as of 02/06/2019, it's not. Once this info is baked in, this flag becomes useless.
Used in:
Option to disallow upscaling.
We wrap the enum in a message to avoid namespace collisions.
(message has no fields)
This enum mirrors the ScaleModes supported by Quad Renderer.
Used in: , ,
Stretch the frame to the exact provided output dimensions.
Scale the frame up to fit the drawing area, preserving aspect ratio; may letterbox.
Scale the frame up to fill the drawing area, preserving aspect ratio; may crop.
A proto that acts as the proxy of SerializationProxyTestClass for serialization.
The value to set the alpha channel to (0-255). This option is ignored when set to -1 (use image mask instead).
Number of side packets which are fed to graph internal streams.
If true, then a timestamp is set for each packet.
If true, then side packets are vectors of packets; otherwise, they are single packets.
We need to accommodate various timestamp modes depending on what we're connecting to.
Used in:
For vectors of packets, the timestamp is the index of the packet within the vector. For single packets, the timestamp is zero.
Timestamps are always set to PreStream.
Timestamps are always set to PostStream. TODO Rename to POST_STREAM.
Do not set timestamp. Can only be used if vectors_of_packets is true. Will cause Timestamp::Unset() run-time errors if the inner packets in the vectors do not already have Timestamps.
Non-linear similarity model (w.r.t. to its parametrization). c_r := cos(rotation); s_r := sin(rotation); Transformation applied to x: [scale 0; * [c_r -s_r; * x + [dx; 0 scale] s_r c_r] dy]
Used in:
angle in [-pi, pi].
A proto like InputCollection::Inputs which has embedded strings within it.
A proto3 calculator options for testing.
Analysis window duration in seconds. Required. Must be greater than 0. (Note: the spectrogram DFT length will be the smallest power-of-2 sample count that can hold this duration.)
Duration of overlap between adjacent windows. Hence, frame_rate = 1/(frame_duration_seconds - frame_overlap_seconds). Required that 0 <= frame_overlap_seconds < frame_duration_seconds.
Whether to pad the final packet with zeros. If true, guarantees that all input samples will output. If set to false, any partial packet at the end of the stream will be dropped.
If set to true then the output will be a vector of spectrograms, one for each channel and the stream will have a MultiStreamTimeSeriesHeader.
Support a fixed multiplicative scaling of the output. This is applied uniformly regardless of output type (i.e., even dBs are multiplied, not offset).
If use_local_timestamp is true, the output packet's timestamp is based on the last sample of the packet and it's inferred from the latest input packet's timestamp. If false, the output packet's timestamp is based on the cumulative timestamping, which is inferred from the intial input timestamp and the cumulative number of samples.
Output value type can be squared-magnitude, linear-magnitude, deciBels (dB, = 20*log10(linear_magnitude)), or std::complex.
Used in:
Which window to use when computing the FFT.
Used in:
The calculator computes log(x + stabilizer). stabilizer must be >= 0, with 0 indicating a lack of stabilization.
If true, CHECK that all input values in are >= 0. If false, the code will take the log of the potentially negative input values plus the stabilizer.
Support a fixed multiplicative scaling of the output.
The settings specifying a status handler and its required external inputs.
Used in:
The name of the registered status handler class.
required
The name of the input side packets. The StatusHandler can access its input side packets by index or by tag. A StatusHandler will only be called if all of its requested input side packets are available (and won't be called if a PacketFactory or PacketGenerator which produces one fails).
DEPRECATED(mgeorg) The old name for input_side_packet.
The options for the status handler.
Stores the profiling information of a stream.
Used in:
Stream name.
If true, than this is a back edge input stream and won't be profiled.
Total and histogram of the time that this stream took.
Options for a switch-container directing traffic to one of several contained subgraph or calculator nodes.
The contained registered subgraphs or calculators.
Activates the specified channel to receive input packets.
Activates channel 1 for enable = true, channel 0 otherwise.
Each synchronization set describes a collection of inputs which must be provided together to the calculator. Any streams which are not in any sync_set will be grouped into a (default) sync set.
Used in:
A description of the streams which will be synchronized together. This description uses the Calculator visible specification of a stream. The format is a tag, then an index with both being optional. If the tag is missing it is assumed to be "" and if the index is missing then it is assumed to be 0. If the index is provided then a colon (':') must be used. Examples: "TAG" -> tag "TAG", index 0 "" -> tag "", index 0 ":0" -> tag "", index 0 ":3" -> tag "", index 3 "VIDEO:0" -> tag "VIDEO", index 0 "VIDEO:2" -> tag "VIDEO", index 2
The value for a template parameter. The value can be either a simple value, a dictionary, or a list.
Used in:
A string value for the parameter.
A numeric value for the parameter.
A dictionary of values for the parameter.
An ordered list of values for the parameter.
A dictionary of parameter values.
Used in: ,
A map from parameter name to parameter value.
Used in:
A template rule or a template rule argument expression.
Used in:
A template parameter name or a literal value.
A template rule operation or a template expression operation.
Nested template expressions, which define the operation args. TODO: Rename this field to avoid collision with TemplateDict::arg.
The path within the protobuf to the modified field values.
The FieldDescriptor::Type of the modified field.
Alternative value for the modified field, in protobuf binary format.
Options for a mediapipe template subgraph consisting of mediapipe template arguments.
The template arguments used to expand the template for the subgraph.
Internal datastructure used during temporal IRLS smoothing.
Used in:
Full Example: node { calculator: "TfLiteConverterCalculator" input_stream: "IMAGE_IN:input_image" output_stream: "TENSOR_OUT:image_tensor" options { [mediapipe.TengineConverterCalculatorOptions.ext] { zero_center: true } } }
Choose normalization mode for output (not applied for Matrix inputs). true = [-1,1] false = [0,1] Ignored if using quantization.
Custom settings to override the internal scaling factors `div` and `sub`. Both values must be set to non-negative values. Will only take effect on CPU AND when |use_custom_normalization| is set to true. When these custom values take effect, the |zero_center| setting above will be overriden, and the normalized_value will be calculated as: normalized_value = input / custom_div - custom_sub.
Whether the input image should be flipped vertically (along the y-direction). This is useful, for example, when the input image is defined with a coordinate system where the origin is at the bottom-left corner (e.g., in OpenGL) whereas the ML model expects an image with a top-left origin.
Controls how many channels of the input image get passed through to the tensor. Valid values are 1,3,4 only. Ignored for iOS GPU.
The calculator expects Matrix inputs to be in column-major order. Set row_major_matrix to true if the inputs are in row-major order.
Quantization option (CPU only). When true, output kTfLiteUInt8 tensor instead of kTfLiteFloat32.
Normalization option.
Used in:
Full Example: node { calculator: "TfLiteInferenceCalculator" input_stream: "TENSOR_IN:image_tensors" output_stream: "TENSOR_OUT:result_tensors" options { [mediapipe.TengineInferenceCalculatorOptions.ext] { model_path: "model.tflite" delegate { gpu {} } } } }
Path to the TF Lite model (ex: /path/to/modelname.tflite). On mobile, this is generally just modelname.tflite.
Whether the TF Lite GPU or CPU backend should be used. Effective only when input tensors are on CPU. For input tensors on GPU, GPU backend is always used. DEPRECATED: configure "delegate" instead.
Android only. When true, an NNAPI delegate will be used for inference. If NNAPI is not available, then the default CPU delegate will be used automatically. DEPRECATED: configure "delegate" instead.
The number of threads available to the interpreter. Effective only when input tensors are on CPU and 'use_gpu' is false.
TfLite delegate to run inference. If not specified, when any of the input and output is on GPU (i.e, using the TENSORS_GPU tag) TFLite GPU delegate is used (as if "gpu {}" is specified), or otherwise regular TFLite on CPU is used (as if "tflite {}" is specified) except when building with emscripten where xnnpack is used. NOTE: use_gpu/use_nnapi are ignored if specified. (Delegate takes precedence over use_* deprecated options.)
Used in:
Delegate to run GPU inference depending on the device. (Can use OpenGl, OpenCl, Metal depending on the device.)
Used in:
Experimental, Android/Linux only. Use TFLite GPU delegate API2 for the NN inference. example: delegate: { gpu { use_advanced_gpu_api: true } }
This option is valid for TFLite GPU delegate API2 only, Set to true to use 16-bit float precision. If max precision is needed, set to false for 32-bit float calculations only.
Load pre-compiled serialized binary cache to accelerate init process. Only available for OpenCL delegate on Android. Kernel caching will only be enabled if this path is set.
This option is valid for TFLite GPU delegate API2 only, Choose any of available APIs to force running inference using it.
Used in:
Android only.
Used in:
(message has no fields)
Default inference provided by tflite.
Used in:
(message has no fields)
Used in:
Number of threads for XNNPACK delegate. (By default, calculator tries to choose optimal number of threads depending on the device.)
The number of output classes predicted by the detection model.
The number of output boxes predicted by the detection model.
The number of output values per boxes predicted by the detection model. The values contain bounding boxes, keypoints, etc.
The offset of keypoint coordinates in the location tensor.
The number of predicted keypoints.
The dimension of each keypoint, e.g. number of values predicted for each keypoint.
The offset of box coordinates in the location tensor.
Parameters for decoding SSD detection model.
Whether to reverse the order of predicted x, y from output. If false, the order is [y_center, x_center, h, w], if true the order is [x_center, y_center, w, h].
The ids of classes that should be ignored during decoding the score for each predicted box.
Whether the detection coordinates from the input tensors should be flipped vertically (along the y-direction). This is useful, for example, when the input tensors represent detections defined with a coordinate system where the origin is at the top-left corner, whereas the desired detection representation has a bottom-left origin (e.g., in OpenGL).
Score threshold for perserving decoded detections.
The number of output classes predicted by the detection model.
The number of output boxes predicted by the detection model.
The number of output values per boxes predicted by the detection model. The values contain bounding boxes, keypoints, etc.
Parameters for decoding SSD detection model.
Whether to reverse the order of predicted x, y from output. If false, the order is [y_center, x_center, h, w], if true the order is [x_center, y_center, w, h].
The ids of classes that should be ignored during decoding the score for each predicted box.
Whether the detection coordinates from the input tensors should be flipped vertically (along the y-direction). This is useful, for example, when the input tensors represent detections defined with a coordinate system where the origin is at the top-left corner, whereas the desired detection representation has a bottom-left origin (e.g., in OpenGL).
Score threshold for perserving decoded detections.
Full Example: node { calculator: "TensorConverterCalculator" input_stream: "IMAGE_IN:input_image" output_stream: "TENSOR_OUT:image_tensor" options { [mediapipe.TensorConverterCalculatorOptions.ext] { zero_center: true } } }
Choose normalization mode for output (not applied for Matrix inputs). true = [-1,1] false = [0,1] Ignored if using quantization.
Custom settings to override the internal scaling factors `div` and `sub`. Both values must be set to non-negative values. Will only take effect on CPU AND when |use_custom_normalization| is set to true. When these custom values take effect, the |zero_center| setting above will be overriden, and the normalized_value will be calculated as: normalized_value = input / custom_div - custom_sub.
Whether the input image should be flipped vertically (along the y-direction). This is useful, for example, when the input image is defined with a coordinate system where the origin is at the bottom-left corner (e.g., in OpenGL) whereas the ML model expects an image with a top-left origin.
Controls how many channels of the input image get passed through to the tensor. Valid values are 1,3,4 only. Ignored for iOS GPU.
The calculator expects Matrix inputs to be in column-major order. Set row_major_matrix to true if the inputs are in row-major order.
Quantization option (CPU only). When true, output kUint8 tensor instead of kFloat32.
Normalization option. Setting normalization_range results in the values normalized to the range [output_tensor_float_range.min, output_tensor_float_range.max].
Used in:
Full Example: node { calculator: "TfLiteConverterCalculator" input_stream: "IMAGE_IN:input_image" output_stream: "TENSOR_OUT:image_tensor" options { [mediapipe.TensorrtConverterCalculatorOptions.ext] { zero_center: true } } }
Choose normalization mode for output (not applied for Matrix inputs). true = [-1,1] false = [0,1] Ignored if using quantization.
Custom settings to override the internal scaling factors `div` and `sub`. Both values must be set to non-negative values. Will only take effect on CPU AND when |use_custom_normalization| is set to true. When these custom values take effect, the |zero_center| setting above will be overriden, and the normalized_value will be calculated as: normalized_value = input / custom_div - custom_sub.
Whether the input image should be flipped vertically (along the y-direction). This is useful, for example, when the input image is defined with a coordinate system where the origin is at the bottom-left corner (e.g., in OpenGL) whereas the ML model expects an image with a top-left origin.
Controls how many channels of the input image get passed through to the tensor. Valid values are 1,3,4 only. Ignored for iOS GPU.
The calculator expects Matrix inputs to be in column-major order. Set row_major_matrix to true if the inputs are in row-major order.
Quantization option (CPU only). When true, output kTfLiteUInt8 tensor instead of kTfLiteFloat32.
Normalization option.
Used in:
Full Example: node { calculator: "TfLiteInferenceCalculator" input_stream: "TENSOR_IN:image_tensors" output_stream: "TENSOR_OUT:result_tensors" options { [mediapipe.TensorrtInferenceCalculatorOptions.ext] { model_path: "model.tflite" delegate { gpu {} } } } }
Path to the TF Lite model (ex: /path/to/modelname.tflite). On mobile, this is generally just modelname.tflite.
Whether the TF Lite GPU or CPU backend should be used. Effective only when input tensors are on CPU. For input tensors on GPU, GPU backend is always used. DEPRECATED: configure "delegate" instead.
Android only. When true, an NNAPI delegate will be used for inference. If NNAPI is not available, then the default CPU delegate will be used automatically. DEPRECATED: configure "delegate" instead.
The number of threads available to the interpreter. Effective only when input tensors are on CPU and 'use_gpu' is false.
TfLite delegate to run inference. If not specified, when any of the input and output is on GPU (i.e, using the TENSORS_GPU tag) TFLite GPU delegate is used (as if "gpu {}" is specified), or otherwise regular TFLite on CPU is used (as if "tflite {}" is specified) except when building with emscripten where xnnpack is used. NOTE: use_gpu/use_nnapi are ignored if specified. (Delegate takes precedence over use_* deprecated options.)
Used in:
Delegate to run GPU inference depending on the device. (Can use OpenGl, OpenCl, Metal depending on the device.)
Used in:
Experimental, Android/Linux only. Use TFLite GPU delegate API2 for the NN inference. example: delegate: { gpu { use_advanced_gpu_api: true } }
This option is valid for TFLite GPU delegate API2 only, Set to true to use 16-bit float precision. If max precision is needed, set to false for 32-bit float calculations only.
Load pre-compiled serialized binary cache to accelerate init process. Only available for OpenCL delegate on Android. Kernel caching will only be enabled if this path is set.
This option is valid for TFLite GPU delegate API2 only, Choose any of available APIs to force running inference using it.
Used in:
Android only.
Used in:
(message has no fields)
Default inference provided by tflite.
Used in:
(message has no fields)
Used in:
Number of threads for XNNPACK delegate. (By default, calculator tries to choose optimal number of threads depending on the device.)
The number of output classes predicted by the detection model.
The number of output boxes predicted by the detection model.
The number of output values per boxes predicted by the detection model. The values contain bounding boxes, keypoints, etc.
The offset of keypoint coordinates in the location tensor.
The number of predicted keypoints.
The dimension of each keypoint, e.g. number of values predicted for each keypoint.
The offset of box coordinates in the location tensor.
Parameters for decoding SSD detection model.
Whether to reverse the order of predicted x, y from output. If false, the order is [y_center, x_center, h, w], if true the order is [x_center, y_center, w, h].
The ids of classes that should be ignored during decoding the score for each predicted box.
Whether the detection coordinates from the input tensors should be flipped vertically (along the y-direction). This is useful, for example, when the input tensors represent detections defined with a coordinate system where the origin is at the top-left corner, whereas the desired detection representation has a bottom-left origin (e.g., in OpenGL).
Score threshold for perserving decoded detections.
The number of output classes predicted by the detection model.
The number of output boxes predicted by the detection model.
The number of output values per boxes predicted by the detection model. The values contain bounding boxes, keypoints, etc.
Parameters for decoding SSD detection model.
Whether to reverse the order of predicted x, y from output. If false, the order is [y_center, x_center, h, w], if true the order is [x_center, y_center, w, h].
The ids of classes that should be ignored during decoding the score for each predicted box.
Whether the detection coordinates from the input tensors should be flipped vertically (along the y-direction). This is useful, for example, when the input tensors represent detections defined with a coordinate system where the origin is at the top-left corner, whereas the desired detection representation has a bottom-left origin (e.g., in OpenGL).
Score threshold for perserving decoded detections.
Score threshold for perserving the class.
Number of highest scoring labels to output. If top_k is not positive then all labels are used.
Path to a label map file for getting the actual name of class ids.
Label map. (Can be used instead of label_map_path.) NOTE: "label_map_path", if specified, takes precedence over "label_map".
Whether the input is a single float for binary classification. When true, only a single float is expected in the input tensor and the label map, if provided, is expected to have exactly two labels. The single score(float) represent the probability of first label, and 1 - score is the probabilility of the second label.
Used in:
Used in:
[Required] The number of output classes predicted by the detection model.
[Required] The number of output boxes predicted by the detection model.
[Required] The number of output values per boxes predicted by the detection model. The values contain bounding boxes, keypoints, etc.
The offset of keypoint coordinates in the location tensor.
The number of predicted keypoints.
The dimension of each keypoint, e.g. number of values predicted for each keypoint.
The offset of box coordinates in the location tensor.
Parameters for decoding SSD detection model.
Whether to reverse the order of predicted x, y from output. If false, the order is [y_center, x_center, h, w], if true the order is [x_center, y_center, w, h].
The ids of classes that should be ignored during decoding the score for each predicted box. Can be overridden with IGNORE_CLASSES side packet.
Whether the detection coordinates from the input tensors should be flipped vertically (along the y-direction). This is useful, for example, when the input tensors represent detections defined with a coordinate system where the origin is at the top-left corner, whereas the desired detection representation has a bottom-left origin (e.g., in OpenGL).
Score threshold for perserving decoded detections.
Apply activation function to the floats.
Used in:
[Required] Number of landmarks from the output of the model.
Size of the input image for the model. These options are used only when normalized landmarks are needed. Z coordinate is scaled as X assuming a weak perspective projection camera model.
Whether the detection coordinates from the input tensors should be flipped vertically (along the y-direction). This is useful, for example, when the input tensors represent detections defined with a coordinate system where the origin is at the top-left corner, whereas the desired detection representation has a bottom-left origin (e.g., in OpenGL).
Whether the detection coordinates from the input tensors should be flipped horizontally (along the x-direction). This is useful, for example, when the input image is horizontally flipped in ImageTransformationCalculator beforehand.
A value that Z coordinates should be divided by. This option is used only when normalized landmarks are needed. It is applied in addition to Z coordinate being re-scaled as X.
Apply activation function to the tensor representing landmark visibility.
Apply activation function to the tensor representing landmark presence.
Used in:
For CONVENTIONAL mode in OpenGL, textures start at bottom and needs to be flipped vertically as tensors are expected to start at top. (DEFAULT or unset is interpreted as CONVENTIONAL.)
Activation function to apply to input tensor. Softmax requires a 2-channel tensor, see output_layer_index below.
Channel to use for processing tensor. Only applies when using activation=SOFTMAX. Works on two channel input tensor only.
Supported activation functions for filtering.
Used in:
Assumes 1-channel input tensor.
Assumes 1-channel input tensor.
Assumes 2-channel input tensor.
Number of threads for running calculators in multithreaded mode. When ThreadPoolExecutorOptions is used in the ExecutorOptions for the default executor with the executor type unspecified, the num_threads field is allowed to be -1 or 0. If not specified or -1, the scheduler will pick an appropriate number of threads depending on the number of available processors.
Make all worker threads have the specified stack size (in bytes). NOTE: The stack_size option may not be implemented on some platforms.
The nice priority level of the worker threads. The nice priority level is 0 by default, and lower value means higher priority. The valid thread nice priority level value range varies by OS. Refer to system documentation for more details.
The performance hint of the processor(s) that the threads will be bound to. Framework will make the best effort to run the threads on the specific processors based on the performance hint. The attempt may fail for various reasons. Success isn't guaranteed.
Name prefix for worker threads, which can be useful for debugging multithreaded applications.
Processor performance enum.
Used in:
Stores the profiling information. It is the responsibility of the user of this message to make sure the 'total' field and the interval information (num, size and count) are in a valid state and all get updated together. Each interval of the histogram is closed on the lower range and open on the higher end. An example histogram with interval_size=1000 and num_interval=3 will have the following intervals: - First interval = [0, 1000) - Second interval = [1000, 2000) - Third interval = [2000, +inf) IMPORTANT: If You add any new field, update CalculatorProfiler::Reset() accordingly.
Used in: ,
Total time (in microseconds).
Size of the runtimes histogram intervals (in microseconds) to generate the histogram of the Process() time. The last interval extends to +inf.
Number of intervals to generate the histogram of the Process() runtime.
Number of calls in each interval.
Frame duration in seconds. Required. Must be greater than 0. This is rounded to the nearest integer number of samples.
Frame overlap in seconds. If emulate_fractional_frame_overlap is false (the default), then the frame overlap is rounded to the nearest integer number of samples, and the step from one frame to the next will be the difference between the number of samples in a frame and the number of samples in the overlap. If emulate_fractional_frame_overlap is true, then frame overlap will be a variable number of samples, such that the long-time average time step from one frame to the next will be the difference between the (nominal, not rounded) frame_duration_seconds and frame_overlap_seconds. This is useful where the desired time step is not an integral number of input samples. A negative frame_overlap_seconds corresponds to skipping some input samples between each frame of emitted samples. Required that frame_overlap_seconds < frame_duration_seconds.
See frame_overlap_seconds for semantics.
Whether to pad the final packet with zeros. If true, guarantees that all input samples (other than those that fall in gaps implied by negative frame_overlap_seconds) will be emitted. If set to false, any partial packet at the end of the stream will be dropped.
If use_local_timestamp is true, the output packet's timestamp is based on the last sample of the packet and it's inferred from the latest input packet's timestamp. If false, the output packet's timestamp is based on the cumulative timestamping, which is inferred from the intial input timestamp and the cumulative number of samples.
Optional windowing function. The default is NONE (no windowing function).
Used in:
Header for a uniformly sampled time series stream. Each Packet in the stream is a Matrix, and each column is a (vector-valued) sample of the series, i.e. each column corresponds to a distinct sample in time.
Used in:
Number of samples per second (hertz). The sample_rate is the reciprocal of the period between consecutive samples within a packet. Required, and must be greater than zero.
The number of channels in each sample. This is the number of rows in the matrix. Required, and must be greater than zero.
For streams that output a fixed number of samples per packet. This field should not be set if the number of samples varies from packet to packet. This is the number of columns in the matrix.
For streams that output Packets at a fixed rate, in Packets per second. In other words, the reciprocal of the difference between consecutive Packet timestamps.
Spectral representations (e.g. from SpectrogramCalculator) will have their sample_rate field indicating the frame rate (e.g. 100 Hz), but downstream consumers need to know the sample_rate of the source time-domain waveform in order to correctly interpret the spectral bins. Units are hertz.
Path to a label map file for getting the actual name of detected classes.
Color of boxes.
Thickness of the drawing of boxes.
Next tag: 14 Proto equivalent of struct TimedBox.
Used in: ,
Normalized coords - in [0, 1]
Rotation of box w.r.t. center in radians.
Unique per object id to disambiguate boxes.
Box lable name.
Confidence of box tracked in the range [0, 1], with 0 being least confident, and 1 being most confident. A reasonable threshold is 0.5 to filter out unconfident boxes.
Aspect ratio (width / height) for the tracked rectangle in physical space. If this field is provided, quad tracking will be performed using 6 degrees of freedom perspective transform between physical rectangle and frame quad. Otherwise, 8 degrees of freedom homography tracking between adjacent frames will be used.
Whether or not to enable reacquisition functionality for this specific box.
Whether we want this box to be potentially grouped with other boxes to track together. This is useful for tracking small boxes that lie on a plane. For example, when we detect a plane, track the plane, then all boxes within the plane can share the same homography transform.
Used in:
The TAG:index of the input stream used as the timestamp base. TimestampAlignInputStreamHandler aligns the packet timestamps of all other input streams with the packet timestamps of this input stream.
Capture tone change between two frames and per-frame tone statistics. The estimated tone change describes the transformation of color intensities from the current to the previous frame. Next tag: 16
TODO: Implement.
Fraction of clipped pixels in [0, 1]. A pixel is considered clipped if more than ToneEstimationOptions::max_clipped_channels are over- or under exposed.
[low|mid|high]_percentile's.
If set, all models are estimated in log domain, specifically intensity I is transformed via log(1.0 + I) := I' Consequently after apply the models, intensity needs to be transformed back to visible range via exp(I') - 1.0.
Stats based on stability analysis.
Used in:
Number of tone matches that were iniliers (used for tone estimation).
Fraction of tone matches that were inliers.
Total IRLS weight summed over all inliers.
ToneChange type indicates whether highest degree of freedom (DOF) model estimation was deemed stable, in which case ToneChange::Type is set to VALID. If a model was deemed not stable (according to *StabilityBounds in ToneEstimationOptions), it is set to the lower dof type which was deemed stable.
Used in:
Identity model, gain bias unrealiable.
Next tag: 13
Percentiles for tone statistics.
Specify the size of either dimension here, the frame will be downsampled to fit downsampling_size.
We support down-sampling of an incoming frame before running the resolution dependent part of the tone estimation. tracking if desired).
Used in:
no downsampling.
downsizes frame such that frame_size ==
downsampling_size. frame_size := max(width, height).
downsizes frame by pre-defined factor.
downsizes frame such that frame_size ==
Used in: ,
Accept 2% intensity difference as valid inlier.
Used in:
Intensity in current frame.
Matching intensity in previous frame.
Used in:
ToneChange's are fit to ToneMatches extracted from matching patches, using order statistics of their corresponding intensities. Matches are defined by having the same percentile of ordered intensities. If any member of the ToneMatch is below under or above over-exposed the match is discarded (based on parameters min and max_exposure above). Matches are extracted from min_match_percentile to max_match_percentile in #match_percentile_steps equidistant steps.
Patch radius from which order statistics are collected.
Only matches with not too many pixels over- or underexposed are used.
If set matches will be collected in the log domain.
How many highest scoring packets to output.
If set, only keep the scores that are greater than the threshold.
Path to a label map file for getting the actual name of classes.
Next tag: 42
Used in:
If set and one of the TRACKING_DEGREE_OBJECT degrees are set also applies camera motion in addition to the object motion.
Number of iterations to iteratively estimate model and re-estimate influence of each vector.
Gaussian spatial prior sigma relative to box size. For motivation, see this plot: http://goo.gl/BCfcy.
Gaussian velocity prior sigma. It is computed as the maximum of the absolute minimum sigma (in normalized domain) and the relative sigma w.r.t. previous motion.
Settings for motion disparity. Difference between previous and current motion magnitude is scored linearly, from motion_disparity_low_level to motion_disparity_high_level (mapped to score of 0 and 1 respectively). Motivation is to ensure acceleration between frames are within reasonable bounds. Represents a maximum acceleration of around 4 - 5 pixels per frame in 360p video to be unpenalized, with accelerations of around >= 10 pixels being considered inconsitent with prediction.
Motion disparity decays across frames. Disparity of previous frame decays over time. If disparity in current frame is not higher, i.e. the larger of the current and decayed disparity is taken. Motivation is, that if acceleration was unreasonable high (and we likely lost tracking) we enter a stage of trying to regain tracking by looking for vectors that agree with the previous prediction.
Object motion is given as linear combination of previous and measured motion depending on the motion_disparity (a high disparity is giving high weight to the previous motion). We enforce at least a minimum of the below motion_prior_weight regardless of the motion disparity.
Settings for motion discrimination. Current motion magnitude is scored linearly, from background_discrimination_low_level to background_discrimination_high_level (mapped to score of 0 and 1 respectively). Motivation is that high object motions are easy to discriminate from the background, whereas small object motions are virtually indistinguishable. Represents a range of 2 - 4 pixels for 360p video.
Spring force settings. If difference between predicted center of the box in the next frame and the predicted center of the inliers deviates by more than inlier_center_relative_distance times the box [width|height] a spring force is applied to the box. The amount of force is spring_force times the difference.
Same as above, but for the center of large motion magnitudes.
Spring force towards large motions is only applied when kinetic energy is above the specified threshold.
Bias of old velocity during update step.
Maximum number of frames considered to be tracking failures -> If over threshold, box is considered untrackable.
Domain used for tracking is always larger than the current box. If current motion is not negligible, box is expanded in the direction the motion, otherwise expanded in all directions by the amount specified below (w.r.t. normalized domain).
Features are scored based on the magnitude of their irls weights, mapped to [0, 1] using the following range. The range represents roughly 3 - 1.5 pixels error for 360p video.
Kinetic energy decays over time by the specified rate.
Amount by which prior is increased/decreased in case of valid/invalid measurements.
We map the amount of present kinetic energy linearly to the domain [0, 1] describing if an object is static (0) or moving (1).
~0.4 pix
~3 pix
Outputs internal state to MotionBoxState.
Specifies which weights are stored in the internal state. By default post-estimation weights are stored, otherwise pre-estimation weights are stored.
Computes spatial grid of inliers and stores it in the MotionBoxState.
Ratio between static motion and temporal scale. This is actually the threshold on speed, under which we consider static (non-moving object).
If number of continued inliers is less than this number, then the object motion model will fall back to translation model. Set this min_continued_inliers threshold to a low number to make sure they follow local object rotation and scale, but it may result in un-robust rotation and scale estimation if the threshold is too low. Recommend that you don't set a number < 4.
Maximum acceptable scale component of object similarity transform. Minimum scale is computed as 1.0 / max_scale. Exclusive for tracking a box with similarity.
Maximum acceptable object similarity rotation in radians.
Homography transform will first be projected to similarity, and the scale component of the similarity transform should be within the range of [1.0 / max_scale, max_scale].
The rotation component of the projected similarity should be smaller than this maximum rotation threshold.
Specifically for quad tracking (aka TRACKING_DEGREE_OBJECT_PERSPECTIVE mode), if aspect_ratio field is set in start pos, pnp tracking will be deployed. If aspect_ratio is unknown (not set), but forced_pnp_tracking is true, we will first estimate the aspect ratio for the 3D quadrangle, then perform pnp tracking. If aspect_ratio is unknown and pnp tracking is not forced, general homography tracking will be deployed.
Pre-calibrated camera intrinsics parameters, including focal length, center point, distortion coefficients (only 3 radial factors) and image width / height. The image formation model is described here: https://docs.opencv.org/2.4/doc/tutorials/calib3d/camera_calibration/camera_calibration.html Only used for quad tracking mode. Leave it empty if unknown.
Used in:
Different control parameters to terminate tracking when occlusion occurs.
Used in:
Irls initialization by performing several rounds of RANSAC to preselect features for motion estimation scoring outliers low and inliers to be at least of median inlier weight.
Used in:
Rounds of RANSAC.
Normalized cutoff threshold for a vector to be considered an inlier.
Degrees of freedom being used for tracking. By default tracker only uses translation. Additionally scale and rotation from the camera motion and / or object motion can be taken into account.
Used in:
Additional tracking degrees according to camera motion.
TODO: Implement!
Tracking degrees modeling object motion. Note that additional object degrees of freedom are only applied when estimation is deemed stable, in particular sufficient inliers are present. By default, does NOT apply camera motion. If that is desired set the flag: track_object_and_camera to true.
Used in:
When we compare two detection boxes, if the ratio of the area is larger than is_same_detection_max_area_ratio, we consider them being different detections.
When we compare two detection boxes, if the overlap ratio is larger than is_same_detection_min_overlap_ratio, we consider them being same detection.
TrackingContainer is self-describing container format to store arbitrary chunks of binary data. Each container is typed via its 4 character header, versioned via an int, and followed by the size of the binary data and the actual data. Designed for clients without availability of protobuffer support. Note: This message is mainly used for documentation purposes and uses custom encoding as specified by FlowPackager::TrackingContainerFormatToBinary. Default binary size of a TrackingContainer (DO NOT CHANGE!): header: 4 byte + version: 4 byte + size: 4 byte + data #size SUM: 12 + #size.
Used in:
4 character header.
Version information.
Size of binary data held by container
Binary data encoded.
Container format for clients without proto support (written via FlowPackager::TrackingContainerFormatToBinary and read via FlowPackager::TrackingContainerFormatFromBinary). Proto here is intermediate format for documentationa and internal use. Stores multiple TrackingContainers of different types. Meta data is storred first, to facilitate random seek (via stream offset positions) to arbitrary binary TrackinData. Termination container signals end of stream.
Wraps binary meta data, via
custom encode.
Wraps BinaryTrackingData.
Add new TrackingContainers above before end of stream indicator. Zero sized termination container with TrackingContainer::header = "TERM".
Simplified proto format of above TrackingContainerFormat. Instead of using self-describing TrackingContainer's, we simply use the proto wire format for encoding and decoding (proto format is typed and versioned via ids).
Next flag: 9
Used in:
Tracking data is resolution independent specified w.r.t. specified domain.
Aspect ratio (w/h) of the original frame tracking data was computed from.
Total number of features in our analysis
Average of all motion vector magnitudes (without accounting for any motion model), within 10th to 90th percentile (to remove outliers).
Background model could not be estimated.
Frame is duplicated, i.e. identical to
previous one. Indicates the beginning of a new chunk. In this case the track_id's are not compatible w.r.t. previous one.
Stores num_elements vectors of motion data. (x,y) position encoded via row_indices and col_starts, as compressed sparse column matrix storage format: (https://en.wikipedia.org/wiki/Sparse_matrix#Compressed_sparse_column_.28CSC_or_CCS.29), Vector data is stored as (dx, dy) position. Optionally we store the fitting error and track id for each feature.
Used in:
#num_elements pairs (flow_x, flow_y) densely packed.
Stores corresponding track index for each feature. Features belonging to the same track over time are assigned the same id. NOTE: Due to size, tracking ids are never stored as compressed binary tracking data.
# num_elements row indices.
Start index in above array for each column (#domain_width + 1 entries).
Feature descriptors for num_elements feature points.
Stores all the tracked ids that have been discarded actively. This information will be used by downstreaming to avoid misjudgement on tracking continuity.
Set as marker for last chunk.
Set as marker for first chunk.
Used in:
Global frame index.
Corresponding timestamp.
Previous frame timestamp.
Next tag: 33
Used in:
Flow direction used internally during tracking features. Forward tracking allows reusing tracked features instead of explicitly tracking them in every frame, and can therefore be faster. See the reuse_features_XXX options below. However, if not reusing features, then it is best to match the direction for both internal tracking and output flow, for peformance reasons.
Direction of flow vectors that are computed and output by calls to retrieve region flow, tracked features, etc. Note when this is BACKWARD, then the returned flow for frame N contains features tracked *from* frame N to a previous frame N-k. When this is FORWARD, the flow for frame N contains the flow from features in a previous frame N-k, tracked *to* frame N. Note that the output flow direction can only be set to FORWARD or BACKWARD.
Number of frame-pairs used for POLICY_MULTI_FRAME, ignored for other policies. Value of 1 means we are tracking features in the current frame, w.r.t. the previous one. Value of 2 denotes tracking of features in current w.r.t the previous one and the one before the previous one, etc.
Maximum length of long feature tracks for POLICY_LONG_TRACKS in frames. Note: This maximum is not hard enforced, to avoid that many long tracks are dropped at the same time. Instead if a feature reaches long_tracks_max_frames * 0.8, it will get dropped with a probability of X, where X is calculated, such that 95% of all qualifying features are dropped within the interval [.8, 1.2] * long_tracks_max_frames.
Hard limit of maximum number of features. Control density of features, with min_feature_distance option. This limit is to guarantee that the run-time of RegionFlowComputation does not spiral out of control.
Fractional tracking distance w.r.t. to frame diameter d. The number of pyramid levels l is chosen such that 2^l * tracking_window_size / 2 >= fractional_tracking_distance * d. Therefore, theoretically it is guaranteed that objects moving less than fractional_tracking_distance * d can be tracked.
If set, modifies tracking distance to be 130% of maximum average tracking distances of previous frames.
Minimum feature distance in pixels. Close features are suppressed. If value < 1, the distance is computed as a fraction of the frame diameter.
By default, when downscaling by factor x, the minimum feature distance is downscaled by a factor of sqrt(x). If set false, no scaling is performed.
Uses grid based extraction of features. Quality level is local within a grid cell and results are combined over all cells and multiple scales and grid offsets. Default option, setting it to false is deprecated and will fail.
Size of each grid cell. Values < 1 are interpreted to be relative to frame_width_ x frame_height_.
Scales / levels employed for feature extraction. Grid cell size is scaled by 0.5 for each level.
If > 1, feature extraction is carried out at multiple scales by downscaling the image repeatedly, extracting features (eigenvalue images) and upscaling them.
Alternate way of specifying extraction levels: number of levels is automatically computed by downsampling the image until its maximum dimension (width or height) reaches this value. Overrides adaptive_extraction_levels if > 0.
Grid step-size in fraction of width or height used for creating synthetic zero motion tracks with feature points lying on a grid. Can be set based on desired number of total features as 1/sqrt(num_features), e.g. .04 ~= 1/sqrt(600).
If set, uses ORB features with brute force matching and ratio test to track frames across larger perspective changes than possible with default KLT features.
Only brute force matches with best_match_distance < ratio_test_threshold * second_best_match_distance are retained.
Refines wide baseline matches by estimating affine transform to wide-baseline matches which is used to seed initial positions for KLT matches.
When tracking features, features tracked from frame A to frame B may be reused as the features for frame B when tracking from it (instead of extracting features). The max_frame_distance flag limits the distance between A and B for the features to be reused. Setting it to 0 => no re-use.
In conjunction with above, the features are reused in frame B only if they are at-least this fraction of the original features in frame A. Otherwise they are reset and extracted from scratch.
If set uses newer OpenCV tracking algorithm. Recommended to be set for all new projects.
Implementation choice of KLT tracker.
Specifies the extraction method for features.
Used in:
Using Harris' approximation of
EXTRACTION_MIN_EIG_VAL.
Exact smallest eigenvalue computation.
Extract using FAST feature detector.
Used in:
threshold on difference between intensity of the central pixel and pixels of a circle around this pixel. Empirically, the larger the threshold, the fewer the keypoints will be detected. Default value set as the same with OpenCV.
Describes direction of flow during feature tracking and for the output region flow.
Used in:
Tracks are forward, from frame N-k -> frame N (k > 0).
Tracks are backward, from frame N -> frame N-k
(k > 0).
Try forward and backward tracking consecutively.
Used in:
Same as in MinEigValExtractionSettings.
Used in:
Use OpenCV's implementation of KLT tracker.
Settings for above corner extraction methods.
Used in:
Quality level of features (features with min_eig_value < quality_level * max_eig_value are rejected). Here [min|max]_eig_value denote the minimum and maximum eigen value of the auto-correlation matrix of the patch centered at a feature point. The ratio of eigenvalues denotes the "cornerness", lower means more pronounced corners. (see http://en.wikipedia.org/wiki/Harris-Affine for details.)
Features below this quality level are always discarded, even if their score is above feature_quality_level() * local maximum within that grid cell. This prevents us from including very poor features.
Specifies how a feature is tracked w.r.t. previous or next frames (dependent on the FlowDirection options above). Per default, each frame is tracked w.r.t. a single neighboring frame (TRACK_SINGLE_FRAME). If associations across multiple frames are desired, TRACK_MULTI_FRAME creates multiple results for the current frame, by tracking features w.r.t. multiple neighbors. Number of neighbors is specified by multi_frames_to_track. If long feature tracks are desired (i.e. a track across a frame pair that is identified to belong to an earlier known feature), use TRACK_ACROSS_FRAMES. Maximum track length can be specified by long_tracks_max_frames.
Used in:
Tracks w.r.t. previous or next frame.
Tracks w.r.t. multiple frames.
Create long feature tracks.
Simple translational model: I * x + [dx; dy] with I being 2x2 identity transform.
Used in:
An arbitrary number of frames per second. Prefer the StandardFps enum to store industry-standard, safe FPS values.
Used in:
The possibly approximated value of the frame rate, in frames per second. Unsafe to use in accurate computations because prone to rounding errors. For example, the 23.976 FPS value has no exact representation as a double.
The exact value of the frame rate, as a rational number.
Used in:
Used in:
Coefficient applied to a new value, whilte `1 - alpha` is applied to a stored value. Should be in [0, 1] range. The smaller the value - the smoother result and the bigger lag.
Default behaviour and fast way to disable smoothing.
Used in:
(message has no fields)