Get desktop application:
View/edit binary Protocol Buffers messages
An aggregate function.
Used in:
,Points to a function_anchor defined in this plan, which must refer to an aggregate function in the associated YAML file. Required; 0 is considered to be a valid anchor/reference.
The arguments to be bound to the function. This must have exactly the number of arguments specified in the function definition, and the argument types must also match exactly: - Value arguments must be bound using FunctionArgument.value, and the expression in that must yield a value of a type that a function overload is defined for. - Type arguments must be bound using FunctionArgument.type, and a function overload must be defined for that type. - Enum arguments must be bound using FunctionArgument.enum followed by Enum.specified, with a string that case-insensitively matches one of the allowed options. - Optional enum arguments must be bound using FunctionArgument.enum followed by either Enum.specified or Enum.unspecified. If specified, the string must case-insensitively match one of the allowed options.
Options to specify behavior for corner cases, or leave behavior unspecified if the consumer does not need specific behavior in these cases.
Must be set to the return type of the function, exactly as derived using the declaration in the extension.
Describes which part of the aggregation to perform within the context of distributed algorithms. Required. Must be set to INITIAL_TO_RESULT for aggregate functions that are not decomposable.
If specified, the aggregated records are ordered according to this list before they are aggregated. The first sort field has the highest priority; only if a sort field determines two records to be equivalent is the next field queried. This field is optional.
Specifies whether equivalent records are merged before being aggregated. Optional, defaults to AGGREGATION_INVOCATION_ALL.
deprecated; use arguments instead
Method in which equivalent records are merged before being aggregated.
Used in:
,This default value implies AGGREGATION_INVOCATION_ALL.
Use all values in the aggregation calculation.
Use only distinct values in the aggregation calculation.
This rel is used to create references, in case we refer to a RelRoot field names will be ignored
The relational operator representing a GROUP BY Aggregate
Used in:
Input of the aggregation
A list of expression grouping that the aggregation measured should be calculated for.
A list of one or more aggregate expressions along with an optional filter.
Used in:
Used in:
An optional boolean expression that acts to filter which records are included in the measure. True means include this record for calculation within the measure. Helps to support SUM(<c>) FILTER(WHERE...) syntax without masking opportunities for optimization
Describes which part of an aggregation or window function to perform within the context of distributed algorithms.
Used in:
,Implies `INTERMEDIATE_TO_RESULT`.
Specifies that the function should be run only up to the point of generating an intermediate value, to be further aggregated later using INTERMEDIATE_TO_INTERMEDIATE or INTERMEDIATE_TO_RESULT.
Specifies that the inputs of the aggregate or window function are the intermediate values of the function, and that the output should also be an intermediate value, to be further aggregated later using INTERMEDIATE_TO_INTERMEDIATE or INTERMEDIATE_TO_RESULT.
A complete invocation: the function should aggregate the given set of inputs to yield a single return value. This style must be used for aggregate or window functions that are not decomposable.
Specifies that the inputs of the aggregate or window function are the intermediate values of the function, generated previously using INITIAL_TO_INTERMEDIATE and possibly INTERMEDIATE_TO_INTERMEDIATE calls. This call should combine the intermediate values to yield the final return value.
Defines a set of Capabilities that a system (producer or consumer) supports.
List of Substrait versions this system supports
list of com.google.Any message types this system supports for advanced extensions.
list of simple extensions this system supports.
Used in:
Cartesian product relational operator of two tables (left and right)
Used in:
Definition of which type of object we are operating on
The columns that will be modified (representing after-image of a schema change)
The default values for the columns (representing after-image of a schema change) E.g., in case of an ALTER TABLE that changes some of the column default values, we expect the table_defaults Struct to report a full list of default values reflecting the result of applying the ALTER TABLE operator successfully
Which type of object we operate on
The type of operation to perform
The body of the CREATE VIEW
Used in:
A Table object in the system
A View object in the system
Used in:
A create operation (for any object)
A create operation if the object does not exist, or replaces it (equivalent to a DROP + CREATE) if the object already exists
An operation that modifies the schema (e.g., column names, types, default values) for the target object
An operation that removes an object from the system
An operation that removes an object from the system (without throwing an exception if the object did not exist)
Used in:
, , , , , , , , , , , , , ,Deprecated in favor of user_defined, which allows nullability and variations to be specified. If user_defined_pointer is encountered, treat it as being non-nullable and having the default variation.
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
,Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
A redistribution operation
the type of exchange used
Send all data to every target.
Used in:
(message has no fields)
The message to describe partition targets of an exchange
Used in:
Describes the partition id(s) to send. If this is empty, all data is sent to this target.
Returns zero or more bucket numbers per record
Used in:
Route approximately
Used in:
whether the round robin behavior is required to exact (per record) or approximate. Defaults to approximate.
Used in:
Returns a single bucket number per record.
Used in:
Used in:
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,deprecated: enum literals are only sensible in the context of function arguments, for which FunctionArgument should now be used
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
(message has no fields)
A reference to an inner part of a complex object. Can reference reference a single element or a masked version of elements
Used in:
, , ,Whether this is composed of a single element reference or a masked element subtree
Whether this reference has an origin of a root struct or is based on the ouput of an expression. When this is a RootReference and direct_reference above is used, the direct_reference must be of a type StructField.
A root reference for the outer relation's subquery
Used in:
number of subquery boundaries to traverse up for this field's reference This value must be >= 1
Singleton that expresses this FieldReference is rooted off the root incoming record type
Used in:
(message has no fields)
Used in:
Used in:
Used in:
, , , , ,Timestamp in units of microseconds since the UNIX epoch.
Date in units of days since the UNIX epoch.
Time in units of microseconds past midnight
Timestamp in units of microseconds since the UNIX epoch.
a typed null literal
whether the literal type should be treated as a nullable type. Applies to all members of union other than the Typed null (which should directly declare nullability).
optionally points to a type_variation_anchor defined in this plan. Applies to all members of union other than the Typed null (which should directly declare the type variation).
Used in:
little-endian twos-complement integer representation of complete value (ignoring precision) Always 16 bytes in length
The maximum number of digits allowed in the value. the maximum precision is 38.
declared scale of decimal literal
Used in:
Used in:
Used in:
A homogeneously typed list of literals
Used in:
Used in:
Used in:
, ,A possibly heterogeneously typed list of literals
Used in:
points to a type_anchor defined in this plan
The parameters to be bound to the type class, if the type class is parameterizable.
the value of the literal, serialized using some type-specific protobuf message
Used in:
A reference that takes an existing subtype and selectively removes fields from it. For example, one might initially have an inner struct with 100 fields but a a particular operation only needs to interact with only 2 of those 100 fields. In this situation, one would use a mask expression to eliminate the 98 fields that are not relevant to the rest of the operation pipeline. Note that this does not fundamentally alter the structure of data beyond the elimination of unecessary elements.
Used in:
,Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
, ,Used in:
Used in:
,Used in:
Used in:
Expression to dynamically construct nested types.
Used in:
Whether the returned nested type is nullable.
Optionally points to a type_variation_anchor defined in this plan for the returned nested type.
Used in:
A homogeneously-typed list of one or more expressions that form the list entries. To specify an empty list, use Literal.empty_list (otherwise type information would be missing).
Used in:
One or more key-value pairs. To specify an empty map, use Literal.empty_map (otherwise type information would be missing).
Used in:
Mandatory key/value expressions.
Used in:
Zero or more possibly heterogeneously-typed list of expressions that form the struct fields.
A way to reference the inner property of a complex record. Can reference either a map key by literal, a struct field by the ordinal position of the desired field or a particular element in an array. Supports expressions that would roughly translate to something similar to: a.b[2].c['my_map_key'].x where a,b,c and x are struct field references (ordinalized in the internal representation here), [2] is a list offset and ['my_map_key'] is a reference into a map field.
Used in:
, , ,Used in:
zero-indexed ordinal position of element in list
Optional child segment
Used in:
literal based reference to specific possible value in map.
Optional child segment
Used in:
zero-indexed ordinal position of field in struct
Optional child segment
A scalar function call.
Used in:
Points to a function_anchor defined in this plan, which must refer to a scalar function in the associated YAML file. Required; avoid using anchor/reference zero.
The arguments to be bound to the function. This must have exactly the number of arguments specified in the function definition, and the argument types must also match exactly: - Value arguments must be bound using FunctionArgument.value, and the expression in that must yield a value of a type that a function overload is defined for. - Type arguments must be bound using FunctionArgument.type. - Enum arguments must be bound using FunctionArgument.enum followed by Enum.specified, with a string that case-insensitively matches one of the allowed options.
Options to specify behavior for corner cases, or leave behavior unspecified if the consumer does not need specific behavior in these cases.
Must be set to the return type of the function, exactly as derived using the declaration in the extension.
Deprecated; use arguments instead.
Used in:
Subquery relation expression
Used in:
Scalar subquery
x IN y predicate
EXISTS/UNIQUE predicate
ANY/ALL predicate
Predicate checking that the left expression is contained in the right subquery Examples: x IN (SELECT * FROM t) (x, y) IN (SELECT a, b FROM t)
Used in:
A subquery with one row and one column. This is often an aggregate though not required to be.
Used in:
A subquery comparison using ANY or ALL. Examples: SELECT * FROM t1 WHERE x < ANY(SELECT y from t2)
Used in:
ANY or ALL
A comparison operator
left side of the expression
right side of the expression
Used in:
Used in:
A predicate over a set of rows in the form of a subquery EXISTS and UNIQUE are common SQL forms of this operation.
Used in:
TODO: should allow expressions
Used in:
Used in:
Used in:
A window function call.
Used in:
Points to a function_anchor defined in this plan, which must refer to a window function in the associated YAML file. Required; 0 is considered to be a valid anchor/reference.
The arguments to be bound to the function. This must have exactly the number of arguments specified in the function definition, and the argument types must also match exactly: - Value arguments must be bound using FunctionArgument.value, and the expression in that must yield a value of a type that a function overload is defined for. - Type arguments must be bound using FunctionArgument.type, and a function overload must be defined for that type. - Enum arguments must be bound using FunctionArgument.enum followed by Enum.specified, with a string that case-insensitively matches one of the allowed options.
Options to specify behavior for corner cases, or leave behavior unspecified if the consumer does not need specific behavior in these cases.
Must be set to the return type of the function, exactly as derived using the declaration in the extension.
Describes which part of the window function to perform within the context of distributed algorithms. Required. Must be set to INITIAL_TO_RESULT for window functions that are not decomposable.
If specified, the records that are part of the window defined by upper_bound and lower_bound are ordered according to this list before they are aggregated. The first sort field has the highest priority; only if a sort field determines two records to be equivalent is the next field queried. This field is optional, and is only allowed if the window function is defined to support sorting.
Specifies whether equivalent records are merged before being aggregated. Optional, defaults to AGGREGATION_INVOCATION_ALL.
When one or more partition expressions are specified, two records are considered to be in the same partition if and only if these expressions yield an equal tuple of values for both. When computing the window function, only the subset of records within the bounds that are also in the same partition as the current record are aggregated.
Defines the record relative to the current record from which the window extends. The bound is inclusive. If the lower bound indexes a record greater than the upper bound, TODO (null range/no records passed? wrapping around as if lower/upper were swapped? error? null?). Optional; defaults to the start of the partition.
Defines the record relative to the current record up to which the window extends. The bound is inclusive. If the upper bound indexes a record less than the lower bound, TODO (null range/no records passed? wrapping around as if lower/upper were swapped? error? null?). Optional; defaults to the end of the partition.
Deprecated; use arguments instead.
Defines one of the two boundaries for the window of a window function.
Used in:
The bound extends some number of records behind the current record.
The bound extends some number of records ahead of the current record.
The bound extends to the current record.
The bound extends to the start of the partition or the end of the partition, depending on whether this represents the upper or lower bound.
Defines that the bound extends to or from the current record.
Used in:
(message has no fields)
Defines that the bound extends this far ahead of the current record.
Used in:
A strictly positive integer specifying the number of records that the window extends ahead of the current record. Required. Use CurrentRow for offset zero and Preceding for negative offsets.
Defines that the bound extends this far back from the current record.
Used in:
A strictly positive integer specifying the number of records that the window extends back from the current record. Required. Use CurrentRow for offset zero and Following for negative offsets.
Defines an "unbounded bound": for lower bounds this means the start of the partition, and for upper bounds this means the end of the partition.
Used in:
(message has no fields)
Used in:
Field names in depth-first order
Describe a set of operations to complete. For compactness sake, identifiers are normalized at the plan level.
Substrait version of the expression. Optional up to 0.17.0, required for later versions.
a list of yaml specifications this expression may depend on
a list of extensions this expression may depend on
one or more expression trees with same order in plan rel
additional extensions associated with this expression.
A list of com.google.Any entities that this plan may use. Can be used to warn if some embedded message types are unknown. Note that this list may include message types that are ignorable (optimizations) or that are unused. In many cases, a consumer may be able to work with a plan even if one or more message types defined here are unknown.
Stub to support extension with a zero inputs
Used in:
Stub to support extension with multiple inputs
Used in:
A stub type that can be used to extend/introduce new table types outside the specification.
Used in:
,Stub to support extension with a single input
Used in:
The relational operator representing LIMIT/OFFSET or TOP type semantics.
Used in:
the offset expressed in number of records
the amount of records to return
The relational operator capturing simple FILTERs (as in the WHERE clause of SQL)
Used in:
The argument of a function
Used in:
, ,An optional function argument. Typically used for specifying behavior in invalid or corner cases.
Used in:
, ,Name of the option to set. If the consumer does not recognize the option, it must reject the plan. The name is matched case-insensitively with option names defined for the function.
List of behavior options allowed by the producer. At least one must be specified; to leave an option unspecified, simply don't add an entry to `options`. The consumer must use the first option from the list that it supports. If the consumer supports none of the specified options, it must reject the plan. The name is matched case-insensitively and must match one of the option values defined for the option.
List of function signatures available.
(message has no fields)
Used in:
, ,Used in:
Used in:
Used in:
Used in:
, ,Used in:
, ,(message has no fields)
Used in:
, ,the minimum number of arguments allowed for the list of final arguments (inclusive).
the maximum number of arguments allowed for the list of final arguments (exclusive)
the type of parameterized type consistency
Used in:
All argument must be the same concrete type.
Each argument can be any possible concrete type afforded by the bounds of any parameter defined in the arguments specification.
Used in:
, ,Used in:
Used in:
The hash equijoin join operator will build a hash table out of the right input based on a set of join keys. It will then probe that hash table for incoming inputs, finding matches.
Used in:
Used in:
The binary JOIN relational operator left-join-right, including various join types, a join condition and post_join_filter expression
Used in:
Used in:
This join is useful for nested sub-queries where we need exactly one tuple in output (or throw exception) See Section 3.2 of https://15721.courses.cs.cmu.edu/spring2018/papers/16-optimizer2/hyperjoins-btw2017.pdf
The merge equijoin does a join by taking advantage of two sets that are sorted on the join keys. This allows the join operation to be done in a streaming fashion.
Used in:
Used in:
A base object for writing (e.g., a table or a view).
Used in:
,The list of string is used to represent namespacing (e.g., mydb.mytable). This assumes shared catalog between systems exchanging a message.
A message for modeling name/type pairs. Useful for representing relation schemas. Notes: * The names field is in depth-first order. For example a schema such as: a: int64 b: struct<c: float32, d: string> would have a `names` field that looks like: ["a", "b", "c", "d"] * Only struct fields are contained in this field's elements, * Map keys should be traversed first, then values when producing/consuming
Used in:
, , ,list of names in dfs order
Used in:
, , , , ,Deprecated in favor of user_defined, which allows nullability and variations to be specified. If user_defined_pointer is encountered, treat it as being non-nullable and having the default variation.
Used in:
, , ,Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
list of names in dfs order
Used in:
,Used in:
Used in:
Used in:
Describe a set of operations to complete. For compactness sake, identifiers are normalized at the plan level.
Substrait version of the plan. Optional up to 0.17.0, required for later versions.
a list of yaml specifications this plan may depend on
a list of extensions this plan may depend on
one or more relation trees that are associated with this plan.
additional extensions associated with this plan.
A list of com.google.Any entities that this plan may use. Can be used to warn if some embedded message types are unknown. Note that this list may include message types that are ignorable (optimizations) or that are unused. In many cases, a consumer may be able to work with a plan even if one or more message types defined here are unknown.
Either a relation or root relation
Used in:
Any relation (used for references and CTEs)
The root of a relation tree
This message type can be used to deserialize only the version of a Substrait Plan message. This prevents deserialization errors when there were breaking changes between the Substrait version of the tool that produced the plan and the Substrait version used to deserialize it, such that a consumer can emit a more helpful error message in this case.
This operator allows to represent calculated expressions of fields (e.g., a+b). Direct/Emit are used to represent classical relational projections
Used in:
The scan operator of base data (physical or virtual), including filtering and projection.
Used in:
Definition of which type of scan operation is to be performed
A stub type that can be used to extend/introduce new table types outside the specification.
Used in:
Represents a list of files in input of a scan operation
Used in:
Many files consist of indivisible chunks (e.g. parquet row groups or CSV rows). If a slice partially selects an indivisible chunk then the consumer should employ some rule to decide which slice to include the chunk in (e.g. include it in the slice that contains the midpoint of the chunk)
Used in:
A URI that can refer to either a single folder or a single file
A URI where the path portion is a glob expression that can identify zero or more paths. Consumers should support the POSIX syntax. The recursive globstar (**) may not be supported.
A URI that refers to a single file
A URI that refers to a single folder
The index of the partition this item belongs to
The start position in byte to read from this item
The length in byte to read from this item
The format of the files.
Used in:
(message has no fields)
Used in:
(message has no fields)
Used in:
(message has no fields)
Used in:
(message has no fields)
A base table. The list of string is used to represent namespacing (e.g., mydb.mytable). This assumes shared catalog between systems exchanging a message.
Used in:
A table composed of literals.
Used in:
A relation (used internally in a plan)
Used in:
, , , , , , , , , , , , , , , , , , , ,Physical relations
Common fields for all relational operators
Used in:
, , , , , , , , , , , , , ,The underlying relation is output as is (no reordering or projection of columns)
Allows to control for order and inclusion of fields
Direct indicates no change on presence and ordering of fields in the output
Used in:
(message has no fields)
Remap which fields are output and in which order
Used in:
Changes to the operation that can influence efficiency/performance but should not impact correctness.
Used in:
TODO: nodes, cpu threads/%, memory, iops, etc.
Used in:
The statistics related to a hint (physical properties of records)
Used in:
A relation with output field names. This is for use at the root of a `Rel` tree.
Used in:
A relation
Field names in depth-first order
The relational set operators (intersection/union/etc..)
Used in:
The first input is the primary input, the remaining are secondary inputs. There must be at least two inputs.
Used in:
The description of a field to sort on (including the direction of sorting and null semantics)
Used in:
, ,Used in:
The ORDERY BY (or sorting) relational operator. Beside describing a base relation, it includes a list of fields to sort on
Used in:
Used in:
, , , , , , , , , , ,Deprecated in favor of user_defined, which allows nullability and variations to be specified. If user_defined_type_reference is encountered, treat it as being non-nullable and having the default variation.
Used in:
, ,Used in:
, ,Used in:
, ,Used in:
Used in:
, ,Used in:
, ,Used in:
Start compound types.
Used in:
Used in:
, ,Used in:
, ,Used in:
, ,Used in:
, ,Used in:
, ,Used in:
, ,Used in:
,Used in:
,Used in:
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,Used in:
,Explicitly null/unspecified parameter, to select the default value (if any).
Data type parameters, like the i32 in LIST<i32>.
Value parameters, like the 10 in VARCHAR<10>.
Used in:
, ,Used in:
,Used in:
, ,Used in:
, ,Used in:
, ,Used in:
, ,Used in:
Used in:
Used in:
, ,Substrait version number.
If a particular version of Substrait is used that does not correspond to a version number exactly (for example when using an unofficial fork or using a version that is not yet released or is between versions), set this to the full git hash of the utilized commit of https://github.com/substrait-io/substrait (or fork thereof), represented using a lowercase hex ASCII string 40 characters in length. The version number above should be set to the most recent version tag in the history of that commit.
Identifying information for the producer that created this plan. Under ideal circumstances, consumers should not need this information. However, it is foreseen that consumers may need to work around bugs in particular producers in practice, and therefore may need to know which producer created the plan.
The operator that modifies the content of a database (operates on 1 table at a time, but tuple-selection/source can be based on joining of multiple tables).
Definition of which TABLE we are operating on
The schema of the table (must align with Rel input (e.g., number of leaf fields must match))
The type of operation to perform
The relation that determines the tuples to add/remove/modify the schema must match with table_schema. Default values must be explicitly stated in a ProjectRel at the top of the input. The match must also occur in case of DELETE to ensure multi-engine plans are unequivocal.
Output mode determines what is the output of executing this rel
Used in:
return no tuples at all
this mode makes the operator return all the tuple INSERTED/DELETED/UPDATED by the operator. The operator returns the AFTER-image of any change. This can be further manipulated by operators upstreams (e.g., retunring the typical "count of modified tuples"). For scenarios in which the BEFORE image is required, the user must implement a spool (via references to subplans in the body of the Rel input) and return those with anounter PlanRel.relations.
Used in:
The insert of new tuples in a table
The removal of tuples from a table
The modification of existing tuples within a table
The Creation of a new table, and the insert of new tuples in the table