Get desktop application:
View/edit binary Protocol Buffers messages
Main interface for the SparkConnect service.
Executes a request that contains the query and returns a stream of [[Response]]. It is guaranteed that there is at least one ARROW batch returned even if the result set is empty.
A request to be executed by the service.
(Required) The session_id specifies a spark session for a user id (which is specified by user_context.user_id). The session_id is set by the client to be able to collate streaming responses from different queries within the dedicated session. The id should be an UUID string of the format `00112233-4455-6677-8899-aabbccddeeff`
(Optional) Server-side generated idempotency key from the previous responses (if any). Server can use this to validate that the server side session has not changed.
(Required) User context user_context.user_id and session+id both identify a unique remote spark session on the server side.
(Optional) Provide an id for this request. If not provided, it will be generated by the server. It is returned in every ExecutePlanResponse.operation_id of the ExecutePlan response stream. The id must be an UUID string of the format `00112233-4455-6677-8899-aabbccddeeff`
(Required) The logical plan to be executed / analyzed.
Provides optional information about the client sending the request. This field can be used for language or version specific information and is only intended for logging purposes and will not be interpreted by the server.
Repeated element for options that can be passed to the request. This element is currently unused but allows to pass in an extension value used for arbitrary options.
Tags to tag the given execution with. Tags cannot contain ',' character and cannot be empty strings. Used by Interrupt with interrupt.tag.
Analyzes a query and returns a [[AnalyzeResponse]] containing metadata about the query.
Request to perform plan analyze, optionally to explain the plan.
(Required) The session_id specifies a spark session for a user id (which is specified by user_context.user_id). The session_id is set by the client to be able to collate streaming responses from different queries within the dedicated session. The id should be an UUID string of the format `00112233-4455-6677-8899-aabbccddeeff`
(Optional) Server-side generated idempotency key from the previous responses (if any). Server can use this to validate that the server side session has not changed.
(Required) User context
Provides optional information about the client sending the request. This field can be used for language or version specific information and is only intended for logging purposes and will not be interpreted by the server.
Response to performing analysis of the query. Contains relevant metadata to be able to reason about the performance. Next ID: 16
Server-side generated idempotency key that the client can use to assert that the server side session has not changed.
Update or fetch the configurations and returns a [[ConfigResponse]] containing the result.
Request to update or fetch the configurations.
(Required) The session_id specifies a spark session for a user id (which is specified by user_context.user_id). The session_id is set by the client to be able to collate streaming responses from different queries within the dedicated session. The id should be an UUID string of the format `00112233-4455-6677-8899-aabbccddeeff`
(Optional) Server-side generated idempotency key from the previous responses (if any). Server can use this to validate that the server side session has not changed.
(Required) User context
(Required) The operation for the config.
Provides optional information about the client sending the request. This field can be used for language or version specific information and is only intended for logging purposes and will not be interpreted by the server.
Response to the config request. Next ID: 5
Server-side generated idempotency key that the client can use to assert that the server side session has not changed.
(Optional) The result key-value pairs. Available when the operation is 'Get', 'GetWithDefault', 'GetOption', 'GetAll'. Also available for the operation 'IsModifiable' with boolean string "true" and "false".
(Optional) Warning messages for deprecated or unsupported configurations.
Add artifacts to the session and returns a [[AddArtifactsResponse]] containing metadata about the added artifacts.
Request to transfer client-local artifacts.
(Required) The session_id specifies a spark session for a user id (which is specified by user_context.user_id). The session_id is set by the client to be able to collate streaming responses from different queries within the dedicated session. The id should be an UUID string of the format `00112233-4455-6677-8899-aabbccddeeff`
User context
(Optional) Server-side generated idempotency key from the previous responses (if any). Server can use this to validate that the server side session has not changed.
Provides optional information about the client sending the request. This field can be used for language or version specific information and is only intended for logging purposes and will not be interpreted by the server.
The payload is either a batch of artifacts or a partial chunk of a large artifact.
The metadata and the initial chunk of a large artifact chunked into multiple requests. The server side is notified about the total size of the large artifact as well as the number of chunks to expect.
A chunk of an artifact excluding metadata. This can be any chunk of a large artifact excluding the first chunk (which is included in `BeginChunkedArtifact`).
Response to adding an artifact. Contains relevant metadata to verify successful transfer of artifact(s). Next ID: 4
Session id in which the AddArtifact was running.
Server-side generated idempotency key that the client can use to assert that the server side session has not changed.
The list of artifact(s) seen by the server.
Check statuses of artifacts in the session and returns them in a [[ArtifactStatusesResponse]]
Request to get current statuses of artifacts at the server side.
(Required) The session_id specifies a spark session for a user id (which is specified by user_context.user_id). The session_id is set by the client to be able to collate streaming responses from different queries within the dedicated session. The id should be an UUID string of the format `00112233-4455-6677-8899-aabbccddeeff`
(Optional) Server-side generated idempotency key from the previous responses (if any). Server can use this to validate that the server side session has not changed.
User context
Provides optional information about the client sending the request. This field can be used for language or version specific information and is only intended for logging purposes and will not be interpreted by the server.
The name of the artifact is expected in the form of a "Relative Path" that is made up of a sequence of directories and the final file element. Examples of "Relative Path"s: "jars/test.jar", "classes/xyz.class", "abc.xyz", "a/b/X.jar". The server is expected to maintain the hierarchy of files as defined by their name. (i.e The relative path of the file on the server's filesystem will be the same as the name of the provided artifact)
Response to checking artifact statuses. Next ID: 4
Session id in which the ArtifactStatus was running.
Server-side generated idempotency key that the client can use to assert that the server side session has not changed.
A map of artifact names to their statuses.
Interrupts running executions
(Required) The session_id specifies a spark session for a user id (which is specified by user_context.user_id). The session_id is set by the client to be able to collate streaming responses from different queries within the dedicated session. The id should be an UUID string of the format `00112233-4455-6677-8899-aabbccddeeff`
(Optional) Server-side generated idempotency key from the previous responses (if any). Server can use this to validate that the server side session has not changed.
(Required) User context
Provides optional information about the client sending the request. This field can be used for language or version specific information and is only intended for logging purposes and will not be interpreted by the server.
(Required) The type of interrupt to execute.
if interrupt_tag == INTERRUPT_TYPE_TAG, interrupt operation with this tag.
if interrupt_tag == INTERRUPT_TYPE_OPERATION_ID, interrupt operation with this operation_id.
Next ID: 4
Session id in which the interrupt was running.
Server-side generated idempotency key that the client can use to assert that the server side session has not changed.
Operation ids of the executions which were interrupted.
Reattach to an existing reattachable execution. The ExecutePlan must have been started with ReattachOptions.reattachable=true. If the ExecutePlanResponse stream ends without a ResultComplete message, there is more to continue. If there is a ResultComplete, the client should use ReleaseExecute with
(Required) The session_id of the request to reattach to. This must be an id of existing session.
(Optional) Server-side generated idempotency key from the previous responses (if any). Server can use this to validate that the server side session has not changed.
(Required) User context user_context.user_id and session+id both identify a unique remote spark session on the server side.
(Required) Provide an id of the request to reattach to. This must be an id of existing operation.
Provides optional information about the client sending the request. This field can be used for language or version specific information and is only intended for logging purposes and will not be interpreted by the server.
(Optional) Last already processed response id from the response stream. After reattach, server will resume the response stream after that response. If not specified, server will restart the stream from the start. Note: server controls the amount of responses that it buffers and it may drop responses, that are far behind the latest returned response, so this can't be used to arbitrarily scroll back the cursor. If the response is no longer available, this will result in an error.
Release an reattachable execution, or parts thereof. The ExecutePlan must have been started with ReattachOptions.reattachable=true. Non reattachable executions are released automatically and immediately after the ExecutePlan RPC and ReleaseExecute may not be used.
(Required) The session_id of the request to reattach to. This must be an id of existing session.
(Optional) Server-side generated idempotency key from the previous responses (if any). Server can use this to validate that the server side session has not changed.
(Required) User context user_context.user_id and session+id both identify a unique remote spark session on the server side.
(Required) Provide an id of the request to reattach to. This must be an id of existing operation.
Provides optional information about the client sending the request. This field can be used for language or version specific information and is only intended for logging purposes and will not be interpreted by the server.
Next ID: 4
Session id in which the release was running.
Server-side generated idempotency key that the client can use to assert that the server side session has not changed.
Operation id of the operation on which the release executed. If the operation couldn't be found (because e.g. it was concurrently released), will be unset. Otherwise, it will be equal to the operation_id from request.
Release a session. All the executions in the session will be released. Any further requests for the session with that session_id for the given user_id will fail. If the session didn't exist or was already released, this is a noop.
(Required) The session_id of the request to reattach to. This must be an id of existing session.
(Required) User context user_context.user_id and session+id both identify a unique remote spark session on the server side.
Provides optional information about the client sending the request. This field can be used for language or version specific information and is only intended for logging purposes and will not be interpreted by the server.
Next ID: 3
Session id of the session on which the release executed.
Server-side generated idempotency key that the client can use to assert that the server side session has not changed.
FetchErrorDetails retrieves the matched exception with details based on a provided error id.
(Required) The session_id specifies a Spark session for a user identified by user_context.user_id. The id should be a UUID string of the format `00112233-4455-6677-8899-aabbccddeeff`.
(Optional) Server-side generated idempotency key from the previous responses (if any). Server can use this to validate that the server side session has not changed.
User context
(Required) The id of the error.
Provides optional information about the client sending the request. This field can be used for language or version specific information and is only intended for logging purposes and will not be interpreted by the server.
Next ID: 5
Server-side generated idempotency key that the client can use to assert that the server side session has not changed.
The index of the root error in errors. The field will not be set if the error is not found.
A list of errors.
A chunk of an Artifact.
Used in:
, ,Data chunk.
CRC to allow server to verify integrity of the chunk.
A number of `SingleChunkArtifact` batched into a single RPC.
Used in:
Signals the beginning/start of a chunked artifact. A large artifact is transferred through a payload of `BeginChunkedArtifact` followed by a sequence of `ArtifactChunk`s.
Used in:
Name of the artifact undergoing chunking. Follows the same conventions as the `name` in the `Artifact` message.
Total size of the artifact in bytes.
Number of chunks the artifact is split into. This includes the `initial_chunk`.
The first/initial chunk.
An artifact that is contained in a single `ArtifactChunk`. Generally, this message represents tiny artifacts such as REPL-generated class files.
Used in:
The name of the artifact is expected in the form of a "Relative Path" that is made up of a sequence of directories and the final file element. Examples of "Relative Path"s: "jars/test.jar", "classes/xyz.class", "abc.xyz", "a/b/X.jar". The server is expected to maintain the hierarchy of files as defined by their name. (i.e The relative path of the file on the server's filesystem will be the same as the name of the provided artifact)
A single data chunk.
Metadata of an artifact.
Used in:
Whether the CRC (Cyclic Redundancy Check) is successful on server verification. The server discards any artifact that fails the CRC. If false, the client may choose to resend the artifact specified by `name`.
Relation of type [[Aggregate]].
Used in:
(Required) Input relation for a RelationalGroupedDataset.
(Required) How the RelationalGroupedDataset was built.
(Required) Expressions for grouping keys
(Required) List of values that will be translated to columns in the output DataFrame.
(Optional) Pivots a column of the current `DataFrame` and performs the specified aggregation.
(Optional) List of values that will be translated to columns in the output DataFrame.
Used in:
Used in:
(Required) Individual grouping set
Used in:
(Required) The column to pivot
(Optional) List of values that will be translated to columns in the output DataFrame. Note that if it is empty, the server side will immediately trigger a job to collect the distinct values of the column.
Used in:
(Required) The DDL formatted string to be parsed.
Explains the input plan based on a configurable mode.
Used in:
(Required) The logical plan to be analyzed.
(Required) For analyzePlan rpc calls, configure the mode to explain plan in strings.
Plan explanation mode.
Used in:
Generates only physical plan.
Generates parsed logical plan, analyzed logical plan, optimized logical plan and physical plan. Parsed Logical plan is a unresolved plan that extracted from the query. Analyzed logical plans transforms which translates unresolvedAttribute and unresolvedRelation into fully typed objects. The optimized logical plan transforms through a set of optimization rules, resulting in the physical plan.
Generates code for the statement, if any and a physical plan.
If plan node statistics are available, generates a logical plan and also the statistics.
Generates a physical plan outline and also node details.
Used in:
(Required) The logical plan to get the storage level.
Used in:
(Required) The logical plan to be analyzed.
Used in:
(Required) The logical plan to be analyzed.
Used in:
(Required) The logical plan to be analyzed.
Used in:
(Required) The logical plan to persist.
(Optional) The storage level.
Returns `true` when the logical query plans are equal and therefore return same results.
Used in:
(Required) The plan to be compared.
(Required) The other plan to be compared.
Used in:
(Required) The logical plan to be analyzed.
Used in:
(Required) The logical plan to get a hashCode.
Used in:
(message has no fields)
Used in:
(Required) The logical plan to be analyzed.
(Optional) Max level of the schema.
Used in:
(Required) The logical plan to unpersist.
(Optional) Whether to block until all blocks are deleted.
Used in:
Used in:
Used in:
(Required) The StorageLevel as a result of get_storage_level request.
Used in:
A best-effort snapshot of the files that compose this Dataset
Used in:
Used in:
Used in:
(message has no fields)
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
(message has no fields)
Used in:
(Required) Input relation for applyInPandasWithState.
(Required) Expressions for grouping keys.
(Required) Input user-defined function.
(Required) Schema for the output DataFrame.
(Required) Schema for the state.
(Required) The output mode of the function.
(Required) Timeout configuration for groups that do not receive data for a while.
Used in:
Exists or not particular artifact at the server.
Relation of type [[AsOfJoin]]. `left` and `right` must be present.
Used in:
(Required) Left input relation for a Join.
(Required) Right input relation for a Join.
(Required) Field to join on in left DataFrame
(Required) Field to join on in right DataFrame
(Optional) The join condition. Could be unset when `using_columns` is utilized. This field does not co-exist with using_columns.
Optional. using_columns provides a list of columns that should present on both sides of the join inputs that this Join will join on. For example A JOIN B USING col_name is equivalent to A JOIN B on A.col_name = B.col_name. This field does not co-exist with join_condition.
(Required) The join type.
(Optional) The asof tolerance within this range.
(Required) Whether allow matching with the same value or not.
(Required) Whether to search for prior, subsequent, or closest matches.
See `spark.catalog.cacheTable`
Used in:
(Required)
(Optional)
A local relation that has been cached already.
Used in:
(Required) A sha-256 hash of the serialized local relation in proto, see LocalRelation.
Represents a remote relation that has been cached on server.
Used in:
, ,(Required) ID of the remote related (assigned by the service).
Used in:
(Required) Unparsed name of the SQL function.
(Optional) Function arguments. Empty arguments are allowed.
Catalog messages are marked as unstable.
Used in:
Used in:
(Required) The logical plan to checkpoint.
(Required) Locally checkpoint using a local temporary directory in Spark Connect server (Spark Driver)
(Required) Whether to checkpoint this dataframe immediately.
Used in:
(Required) The logical plan checkpointed.
See `spark.catalog.clearCache`
Used in:
(message has no fields)
Used in:
(Required) One input relation for CoGroup Map API - applyInPandas.
Expressions for grouping keys of the first input relation.
(Required) The other input relation.
Expressions for grouping keys of the other input relation.
(Required) Input user-defined function.
(Optional) Expressions for sorting. Only used by Scala Sorted CoGroup Map API.
(Optional) Expressions for sorting. Only used by Scala Sorted CoGroup Map API.
Collect arbitrary (named) metrics from a dataset.
Used in:
(Required) The input relation.
(Required) Name of the metrics.
(Required) The metric sequence.
A [[Command]] is an operation that is executed by the server that does not directly consume or produce a relational result.
Used in:
This field is used to mark extensions to the protocol. When plugins generate arbitrary Commands they can add them here. During the planning the correct resolution is done.
Used in:
,(Required) Name of the data source.
(Required) The data source type.
Used in:
, , , , ,(Required) Name of the user-defined function.
(Optional) Indicate if the user-defined function is deterministic.
(Optional) Function arguments. Empty arguments are allowed.
(Required) Indicate the function type of the user-defined function.
Used in:
,(Required) Name of the user-defined table function.
(Optional) Whether the user-defined table function is deterministic.
(Optional) Function input arguments. Empty arguments are allowed.
(Required) Type of the user-defined table function.
Used in:
(Required) The config keys to get.
Used in:
(Optional) The prefix of the config key to get.
Used in:
(Required) The config keys to get optionally.
Used in:
(Required) The config key-value pairs to get. The value will be used as the default value.
Used in:
(Required) The config keys to check the config is modifiable.
Used in:
Used in:
(Required) The config key-value pairs to set.
Used in:
(Required) The config keys to unset.
A command that can create DataFrame global temp view or local temp view.
Used in:
(Required) The relation that this view will be built on.
(Required) View name.
(Required) Whether this is global temp view or local temp view.
(Required) If true, and if the view already exists, updates it; if false, and if the view already exists, throws exception.
See `spark.catalog.createExternalTable`
Used in:
(Required)
(Optional)
(Optional)
(Optional)
Options could be empty for valid data source format. The map key is case insensitive.
Command to create ResourceProfile
Used in:
(Required) The ResourceProfile to be built on the server-side.
Response for command 'CreateResourceProfileCommand'.
Used in:
(Required) Server-side generated resource profile id.
See `spark.catalog.createTable`
Used in:
(Required)
(Optional)
(Optional)
(Optional)
(Optional)
Options could be empty for valid data source format. The map key is case insensitive.
See `spark.catalog.currentCatalog`
Used in:
(message has no fields)
See `spark.catalog.currentDatabase`
Used in:
(message has no fields)
This message describes the logical [[DataType]] of something. It does not carry the value itself but only describes it.
Used in:
, , , , , , , , , , , , , , , , , , ,Numeric types
String types
Datatime types
Interval types
Complex types
UserDefinedType
UnparsedDataType
Used in:
Used in:
Used in:
Used in:
Used in:
Start compound types.
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
Used in:
(Required) The unparsed data type string
Used in:
Used in:
Used in:
See `spark.catalog.databaseExists`
Used in:
(Required)
Relation of type [[Deduplicate]] which have duplicate rows removed, could consider either only the subset of columns or all the columns.
Used in:
(Required) Input relation for a Deduplicate.
(Optional) Deduplicate based on a list of column names. This field does not co-use with `all_columns_as_keys`.
(Optional) Deduplicate based on all the columns of the input relation. This field does not co-use with `column_names`.
(Optional) Deduplicate within the time range of watermark.
Drop specified columns.
Used in:
(Required) The input relation.
(Optional) columns to drop.
(Optional) names of columns to drop.
See `spark.catalog.dropGlobalTempView`
Used in:
(Required)
See `spark.catalog.dropTempView`
Used in:
(Required)
Used in:
Extension type for request options
The response of a query, can be one or more for each request. Responses belonging to the same input query, carry the same `session_id`. Next ID: 17
Used as response type in: SparkConnectService.ExecutePlan, SparkConnectService.ReattachExecute
Server-side generated idempotency key that the client can use to assert that the server side session has not changed.
Identifies the ExecutePlan execution. If set by the client in ExecutePlanRequest.operationId, that value is returned. Otherwise generated by the server. It is an UUID string of the format `00112233-4455-6677-8899-aabbccddeeff`
Identified the response in the stream. The id is an UUID string of the format `00112233-4455-6677-8899-aabbccddeeff`
Union type for the different response messages.
Special case for executing SQL commands.
Response for a streaming query.
Response for commands on a streaming query.
Response for 'SparkContext.resources'.
Response for commands on the streaming query manager.
Response for commands on the client side streaming query listener.
Response type informing if the stream is complete in reattachable execution.
Response for command that creates ResourceProfile.
(Optional) Intermediate query progress reports.
Response for command that checkpoints a DataFrame.
Support arbitrary result objects.
Metrics for the query execution. Typically, this field is only present in the last batch of results and then represent the overall state of the query execution.
The metrics observed during the execution of the query plan.
(Optional) The Spark schema. This field is available when `collect` is called.
Batch results of metrics.
Used in:
Count rows in `data`. Must match the number of rows inside `data`.
Serialized Arrow data.
If set, row offset of the start of this ArrowBatch in execution results.
This message is used to communicate progress about the query progress during the execution.
Used in:
Captures the progress of each individual stage.
Captures the currently in progress tasks.
Used in:
Used in:
Used in:
Used in:
Used in:
If present, in a reattachable execution this means that after server sends onComplete, the execution is complete. If the server sends onComplete without sending a ResultComplete, it means that there is more, and the client should use ReattachExecute RPC to continue.
Used in:
(message has no fields)
A SQL command returns an opaque Relation that can be directly used as input for the next call.
Used in:
An executor resource request.
Used in:
(Required) resource name.
(Required) resource amount requesting.
Optional script used to discover the resources.
Optional vendor, required for some cluster managers.
Expression used to refer to fields, functions and similar. This can be used everywhere expressions in SQL appear.
Used in:
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,This field is used to mark extensions to the protocol. When plugins generate arbitrary relations they can add them here. During the planning the correct resolution is done.
Used in:
,(Required) The expression that alias will be added on.
(Required) a list of name parts for the alias. Scalar columns only has one name that presents.
(Optional) Alias metadata expressed as a JSON map.
Used in:
(Required) the expression to be casted.
(Required) the data type that the expr to be casted to.
If this is set, Server will use Catalyst parser to parse this string to DataType.
(Optional) The expression evaluation mode.
Used in:
Expression as string.
Used in:
(Required) A SQL expression that will be parsed by Catalyst parser.
Used in:
(Required) The lambda function. The function body should use 'UnresolvedAttribute' as arguments, the sever side will replace 'UnresolvedAttribute' with 'UnresolvedNamedLambdaVariable'.
(Required) Function variables. Must contains 1 ~ 3 variables.
Used in:
, , , , , , , , , ,Date in units of days since the UNIX epoch.
Timestamp in units of microseconds since the UNIX epoch.
Timestamp in units of microseconds since the UNIX epoch (without timezone information).
Used in:
Used in:
Used in:
the string representation.
The maximum number of digits allowed in the value. the maximum precision is 38.
declared scale of decimal literal
Used in:
Used in:
SortOrder is used to specify the data ordering, it is normally used in Sort and Window. It is an unevaluable expression and cannot be evaluated, so can not be used in Projection.
Used in:
, ,(Required) The expression to be sorted.
(Required) The sort direction, should be ASCENDING or DESCENDING.
(Required) How to deal with NULLs, should be NULLS_FIRST or NULLS_LAST.
Used in:
Used in:
An unresolved attribute that is not explicitly bound to a specific column, but the column is resolved during analysis by name.
Used in:
(Required) An identifier that will be parsed by Catalyst parser. This should follow the Spark SQL identifier syntax.
(Optional) The id of corresponding connect plan.
(Optional) The requested column is a metadata column.
Extracts a value or values from an Expression
Used in:
(Required) The expression to extract value from, can be Map, Array, Struct or array of Structs.
(Required) The expression to describe the extraction, can be key of Map, index of Array, field name of Struct.
An unresolved function is not explicitly bound to one explicit function, but the function is resolved during analysis following Sparks name resolution rules.
Used in:
(Required) name (or unparsed name for user defined function) for the unresolved function.
(Optional) Function arguments. Empty arguments are allowed.
(Required) Indicate if this function should be applied on distinct values.
(Required) Indicate if this is a user defined function. When it is not a user defined function, Connect will use the function name directly. When it is a user defined function, Connect will parse the function name first.
Used in:
,(Required) a list of name parts for the variable. Must not be empty.
Represents all of the input attributes to a given relational operator, for example in "SELECT `(id)?+.+` FROM ...".
Used in:
(Required) The column name used to extract column with regex.
(Optional) The id of corresponding connect plan.
UnresolvedStar is used to expand all the fields of a relation or struct.
Used in:
(Optional) The target of the expansion. If set, it should end with '.*' and will be parsed by 'parseAttributeName' in the server side.
(Optional) The id of corresponding connect plan.
Add, replace or drop a field of `StructType` expression by name.
Used in:
(Required) The struct expression.
(Required) The field name.
(Optional) The expression to add or replace. When not set, it means this field will be dropped.
Expression for the OVER clause or WINDOW clause.
Used in:
(Required) The window function.
(Optional) The way that input rows are partitioned.
(Optional) Ordering of rows in a partition.
(Optional) Window frame in a partition. If not set, it will be treated as 'UnspecifiedFrame'.
The window frame
Used in:
(Required) The type of the frame.
(Required) The lower bound of the frame.
(Required) The upper bound of the frame.
Used in:
CURRENT ROW boundary
UNBOUNDED boundary. For lower bound, it will be converted to 'UnboundedPreceding'. for upper bound, it will be converted to 'UnboundedFollowing'.
This is an expression for future proofing. We are expecting literals on the server side.
Used in:
RowFrame treats rows in a partition individually.
RangeFrame treats rows in a partition as groups of peers. All rows having the same 'ORDER BY' ordering are considered as peers.
Used in:
(Required) Keep the information of the origin for this expression such as stacktrace.
Error defines the schema for the representing exception.
Used in:
The fully qualified names of the exception class and its parent classes.
The detailed message of the exception.
The stackTrace of the exception. It will be set if the SQLConf spark.sql.connect.serverStacktrace.enabled is true.
The index of the cause error in errors.
The structured data of a SparkThrowable exception.
QueryContext defines the schema for the query context of a SparkThrowable. It helps users understand where the error occurs while executing queries.
Used in:
The object type of the query which throws the exception. If the exception is directly from the main query, it should be an empty string. Otherwise, it should be the exact object type in upper case. For example, a "VIEW".
The object name of the query which throws the exception. If the exception is directly from the main query, it should be an empty string. Otherwise, it should be the object name. For example, a view name "V1".
The starting index in the query text which throws the exception. The index starts from 0.
The stopping index in the query which throws the exception. The index starts from 0.
The corresponding fragment of the query which throws the exception.
The user code (call site of the API) that caused throwing the exception.
Summary of the exception cause.
The type of this query context.
Used in:
SparkThrowable defines the schema for SparkThrowable exceptions.
Used in:
Succinct, human-readable, unique, and consistent representation of the error category.
The message parameters for the error framework.
The query context of a SparkThrowable.
Portable error identifier across SQL engines If null, error class or SQLSTATE is not set.
Used in:
The fully qualified name of the class containing the execution point.
The name of the method containing the execution point.
The name of the file containing the execution point.
The line number of the source line containing the execution point.
Relation that applies a boolean expression `condition` on each row of `input` to produce the output result.
Used in:
(Required) Input relation for a Filter.
(Required) A Filter must have a condition expression.
See `spark.catalog.functionExists`
Used in:
(Required)
(Optional)
See `spark.catalog.getDatabase`
Used in:
(Required)
See `spark.catalog.getFunction`
Used in:
(Required)
(Optional)
Command to get the output of 'SparkContext.resources'
Used in:
(message has no fields)
Response for command 'GetResourcesCommand'.
Used in:
See `spark.catalog.getTable`
Used in:
(Required)
(Optional)
Used in:
(Required) Input relation for Group Map API: apply, applyInPandas.
(Required) Expressions for grouping keys.
(Required) Input user-defined function.
(Optional) Expressions for sorting. Only used by Scala Sorted Group Map API.
Below fields are only used by (Flat)MapGroupsWithState (Optional) Input relation for initial State.
(Optional) Expressions for grouping keys of the initial state input relation.
(Optional) True if MapGroupsWithState, false if FlatMapGroupsWithState.
(Optional) The output mode of the function.
(Optional) Timeout configuration for groups that do not receive data for a while.
Specify a hint over a relation. Hint should have a name and optional parameters.
Used in:
(Required) The input relation.
(Required) Hint name. Supported Join hints include BROADCAST, MERGE, SHUFFLE_HASH, SHUFFLE_REPLICATE_NL. Supported partitioning hints include COALESCE, REPARTITION, REPARTITION_BY_RANGE.
(Optional) Hint parameters.
Compose the string representing rows for output. It will invoke 'Dataset.htmlString' to compute the results.
Used in:
(Required) The input relation.
(Required) Number of rows to show.
(Required) If set to more than 0, truncates strings to `truncate` characters and all cells will be aligned right.
Used in:
Interrupt all running executions within the session with the provided session_id.
Interrupt all running executions within the session with the provided operation_tag.
Interrupt the running execution within the session with the provided operation_id.
See `spark.catalog.isCached`
Used in:
(Required)
Used in:
(Required) Fully qualified name of Java class
(Optional) Output type of the Java UDF
(Required) Indicate if the Java user-defined function is an aggregate function
Relation of type [[Join]]. `left` and `right` must be present.
Used in:
(Required) Left input relation for a Join.
(Required) Right input relation for a Join.
(Optional) The join condition. Could be unset when `using_columns` is utilized. This field does not co-exist with using_columns.
(Required) The join type.
Optional. using_columns provides a list of columns that should present on both sides of the join inputs that this Join will join on. For example A JOIN B USING col_name is equivalent to A JOIN B on A.col_name = B.col_name. This field does not co-exist with join_condition.
(Optional) Only used by joinWith. Set the left and right join data types.
Used in:
If the left data type is a struct.
If the right data type is a struct.
Used in:
The key-value pair for the config request and response.
Used in:
, ,(Required) The key.
(Optional) The value.
Relation of type [[Limit]] that is used to `limit` rows from the input relation.
Used in:
(Required) Input relation for a Limit.
(Required) the limit.
See `spark.catalog.listCatalogs`
Used in:
(Optional) The pattern that the catalog name needs to match
See `spark.catalog.listColumns`
Used in:
(Required)
(Optional)
See `spark.catalog.listDatabases`
Used in:
(Optional) The pattern that the database name needs to match
See `spark.catalog.listFunctions`
Used in:
(Optional)
(Optional) The pattern that the function name needs to match
See `spark.catalog.listTables`
Used in:
(Optional)
(Optional) The pattern that the table name needs to match
A relation that does not need to be qualified by name.
Used in:
(Optional) Local collection data serialized into Arrow IPC streaming format which contains the schema of the data.
(Optional) The schema of local data. It should be either a DDL-formatted type string or a JSON string. The server side will update the column names and data types according to this schema. If the 'data' is not provided, then this schema will be required.
Used in:
(Required) Input relation for a mapPartitions-equivalent API: mapInPandas, mapInArrow.
(Required) Input user-defined function.
(Optional) Whether to use barrier mode execution or not.
(Optional) ResourceProfile id used for the stage level scheduling.
Used in:
(Required) The action type of the merge action.
(Optional) The condition expression of the merge action.
(Optional) The assignments of the merge action. Required for ActionTypes INSERT and UPDATE.
Used in:
Used in:
(Required) The key of the assignment.
(Required) The value of the assignment.
Used in:
(Required) The name of the target table.
(Required) The relation of the source table.
(Required) The condition to match the source and target.
(Optional) The actions to be taken when the condition is matched.
(Optional) The actions to be taken when the condition is not matched.
(Optional) The actions to be taken when the condition is not matched by source.
(Required) Whether to enable schema evolution.
Drop rows containing null values. It will invoke 'Dataset.na.drop' (same as 'DataFrameNaFunctions.drop') to compute the results.
Used in:
(Required) The input relation.
(Optional) Optional list of column names to consider. When it is empty, all the columns in the input relation will be considered.
(Optional) The minimum number of non-null and non-NaN values required to keep. When not set, it is equivalent to the number of considered columns, which means a row will be kept only if all columns are non-null. 'how' options ('all', 'any') can be easily converted to this field: - 'all' -> set 'min_non_nulls' 1; - 'any' -> keep 'min_non_nulls' unset;
Replaces null values. It will invoke 'Dataset.na.fill' (same as 'DataFrameNaFunctions.fill') to compute the results. Following 3 parameter combinations are supported: 1, 'values' only contains 1 item, 'cols' is empty: replaces null values in all type-compatible columns. 2, 'values' only contains 1 item, 'cols' is not empty: replaces null values in specified columns. 3, 'values' contains more than 1 items, then 'cols' is required to have the same length: replaces each specified column with corresponding value.
Used in:
(Required) The input relation.
(Optional) Optional list of column names to consider.
(Required) Values to replace null values with. Should contain at least 1 item. Only 4 data types are supported now: bool, long, double, string
Replaces old values with the corresponding values. It will invoke 'Dataset.na.replace' (same as 'DataFrameNaFunctions.replace') to compute the results.
Used in:
(Required) The input relation.
(Optional) List of column names to consider. When it is empty, all the type-compatible columns in the input relation will be considered.
(Optional) The value replacement mapping.
Used in:
(Required) The old value. Only 4 data types are supported now: null, bool, double, string.
(Required) The new value. Should be of the same data type with the old value.
Used in:
(Required) The key of the named argument.
(Required) The value expression of the named argument.
Relation of type [[Offset]] that is used to read rows staring from the `offset` on the input relation.
Used in:
(Required) Input relation for an Offset.
(Required) the limit.
Used in:
,(Required) Indicate the origin type.
Used in:
(Required) Input relation to Parse. The input is expected to have single text column.
(Required) The expected format of the text.
(Optional) DataType representing the schema. If not set, Spark will infer the schema.
Options for the csv/json parser. The map key is case insensitive.
Used in:
A [[Plan]] is the structure that carries the runtime information for the execution from the client to the server. A [[Plan]] can either be of the type [[Relation]] which is a reference to the underlying logical plan or it can be of the [[Command]] type that is used to execute commands on the server.
Used in:
, , , , , , , ,Projection of a bag of expressions for a given input relation. The input relation must be specified. The projected expression can be an arbitrary expression.
Used in:
(Optional) Input relation is optional for Project. For example, `SELECT ABS(-1)` is valid plan without an input plan.
(Required) A Project requires at least one expression.
Used in:
(Required) The encoded commands of the Python data source.
(Required) Python version being used in the client.
Used in:
(Required) Name of the origin, for example, the name of the function
(Required) Callsite to show to end users, for example, stacktrace.
Used in:
, ,(Required) Output type of the Python UDF
(Required) EvalType of the Python UDF
(Required) The encoded commands of the Python UDF
(Required) Python version being used in the client.
(Optional) Additional includes for the Python UDF.
Used in:
(Optional) Return type of the Python UDTF.
(Required) EvalType of the Python UDTF.
(Required) The encoded commands of the Python UDTF.
(Required) Python version being used in the client.
Relation of type [[Range]] that generates a sequence of integers.
Used in:
(Optional) Default value = 0
(Required)
(Required)
Optional. Default value is assigned by 1) SQL conf "spark.sql.leafNodeDefaultParallelism" if it is set, or 2) spark default parallelism.
Relation that reads from a file / table or other data source. Does not have additional inputs.
Used in:
(Optional) Indicates if this is a streaming read.
Used in:
(Optional) Supported formats include: parquet, orc, text, json, parquet, csv, avro. If not set, the value from SQL conf 'spark.sql.sources.default' will be used.
(Optional) If not set, Spark will infer the schema. This schema string should be either DDL-formatted or JSON-formatted.
Options for the data source. The context of this map varies based on the data source format. This options could be empty for valid data source format. The map key is case insensitive.
(Optional) A list of path for file-system backed data sources.
(Optional) Condition in the where clause for each partition. This is only supported by the JDBC data source.
Used in:
(Required) Unparsed identifier for the table.
Options for the named table. The map key is case insensitive.
Used in:
If true, the request can be reattached to using ReattachExecute. ReattachExecute can be used either if the stream broke with a GRPC network error, or if the server closed the stream without sending a response with StreamStatus.complete=true. The server will keep a buffer of responses in case a response is lost, and ReattachExecute needs to back-track. If false, the execution response stream will will not be reattachable, and all responses are immediately released by the server after being sent.
See `spark.catalog.recoverPartitions`
Used in:
(Required)
See `spark.catalog.refreshByPath`
Used in:
(Required)
See `spark.catalog.refreshTable`
Used in:
(Required)
The main [[Relation]] type. Fundamentally, a relation is a typed container that has exactly one explicit relation type set. When adding new relation types, they have to be registered here.
Used in:
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,NA functions
stat functions
Catalog API (experimental / unstable)
This field is used to mark extensions to the protocol. When plugins generate arbitrary relations they can add them here. During the planning the correct resolution is done.
Common metadata of all relations.
Used in:
(Required) Shared relation metadata.
(Optional) A per-client globally unique id for a given connect plan.
(Optional) Keep the information of the origin for this expression such as stacktrace.
Release and close operation completely. This will also interrupt the query if it is running execution, and wait for it to be torn down.
Used in:
(message has no fields)
Release all responses from the operation response stream up to and including the response with the given by response_id. While server determines by itself how much of a buffer of responses to keep, client providing explicit release calls will help reduce resource consumption. Noop if response_id not found in cached responses.
Used in:
Command to remove `CashedRemoteRelation`
Used in:
(Required) The remote to be related
Relation repartition.
Used in:
(Required) The input relation of Repartition.
(Required) Must be positive.
(Optional) Default value is false.
Used in:
(Required) The input relation.
(Required) The partitioning expressions.
(Optional) number of partitions, must be positive.
ResourceInformation to hold information about a type of Resource. The corresponding class is 'org.apache.spark.resource.ResourceInformation'
Used in:
(Required) The name of the resource
(Required) An array of strings describing the addresses of the resource.
Used in:
(Optional) Resource requests for executors. Mapped from the resource name (e.g., cores, memory, CPU) to its specific request.
(Optional) Resource requests for tasks. Mapped from the resource name (e.g., cores, memory, CPU) to its specific request.
Relation that uses a SQL query to generate the output.
Used in:
(Required) The SQL query.
(Optional) A map of parameter names to literal expressions.
(Optional) A sequence of literal expressions for positional parameters in the SQL query text.
(Optional) A map of parameter names to expressions. It cannot coexist with `pos_arguments`.
(Optional) A sequence of expressions for positional parameters in the SQL query text. It cannot coexist with `named_arguments`.
Relation of type [[Sample]] that samples a fraction of the dataset.
Used in:
(Required) Input relation for a Sample.
(Required) lower bound.
(Required) upper bound.
(Optional) Whether to sample with replacement.
(Required) The random seed. This field is required to avoid generating mutable dataframes (see SPARK-48184 for details), however, still keep it 'optional' here for backward compatibility.
(Required) Explicitly sort the underlying plan to make the ordering deterministic or cache it. This flag is true when invoking `dataframe.randomSplit` to randomly splits DataFrame with the provided weights. Otherwise, it is false.
Used in:
, ,(Required) Serialized JVM object containing UDF definition, input encoders and output encoder
(Optional) Input type(s) of the UDF
(Required) Output type of the UDF
(Required) True if the UDF can return null value
(Required) Indicate if the UDF is an aggregate function
See `spark.catalog.setCurrentCatalog`
Used in:
(Required)
See `spark.catalog.setCurrentDatabase`
Used in:
(Required)
Relation of type [[SetOperation]]
Used in:
(Required) Left input relation for a Set operation.
(Required) Right input relation for a Set operation.
(Required) The Set operation type.
(Optional) If to remove duplicate rows. True to preserve all results. False to remove duplicate rows.
(Optional) If to perform the Set operation based on name resolution. Only UNION supports this option.
(Optional) If to perform the Set operation and allow missing columns. Only UNION supports this option.
Used in:
Compose the string representing rows for output. It will invoke 'Dataset.showString' to compute the results.
Used in:
(Required) The input relation.
(Required) Number of rows to show.
(Required) If set to more than 0, truncates strings to `truncate` characters and all cells will be aligned right.
(Required) If set to true, prints output rows vertically (one line per column value).
Relation of type [[Sort]].
Used in:
(Required) Input relation for a Sort.
(Required) The ordering expressions
(Optional) if this is a global sort.
A SQL Command is used to trigger the eager evaluation of SQL commands in Spark. When the SQL provide as part of the message is a command it will be immediately evaluated and the result will be collected and returned as part of a LocalRelation. If the result is not a command, the operation will simply return a SQL Relation. This allows the client to be almost oblivious to the server-side behavior.
Used in:
(Required) SQL Query.
(Optional) A map of parameter names to literal expressions.
(Optional) A sequence of literal expressions for positional parameters in the SQL query text.
(Optional) A map of parameter names to expressions. It cannot coexist with `pos_arguments`.
(Optional) A sequence of expressions for positional parameters in the SQL query text. It cannot coexist with `named_arguments`.
(Optional) The relation that this SQL command will be built on.
Calculates the approximate quantiles of numerical columns of a DataFrame. It will invoke 'Dataset.stat.approxQuantile' (same as 'StatFunctions.approxQuantile') to compute the results.
Used in:
(Required) The input relation.
(Required) The names of the numerical columns.
(Required) A list of quantile probabilities. Each number must belong to [0, 1]. For example 0 is the minimum, 0.5 is the median, 1 is the maximum.
(Required) The relative target precision to achieve (greater than or equal to 0). If set to zero, the exact quantiles are computed, which could be very expensive. Note that values greater than 1 are accepted but give the same result as 1.
Calculates the correlation of two columns of a DataFrame. Currently only supports the Pearson Correlation Coefficient. It will invoke 'Dataset.stat.corr' (same as 'StatFunctions.pearsonCorrelation') to compute the results.
Used in:
(Required) The input relation.
(Required) The name of the first column.
(Required) The name of the second column.
(Optional) Default value is 'pearson'. Currently only supports the Pearson Correlation Coefficient.
Calculate the sample covariance of two numerical columns of a DataFrame. It will invoke 'Dataset.stat.cov' (same as 'StatFunctions.calculateCov') to compute the results.
Used in:
(Required) The input relation.
(Required) The name of the first column.
(Required) The name of the second column.
Computes a pair-wise frequency table of the given columns. Also known as a contingency table. It will invoke 'Dataset.stat.crosstab' (same as 'StatFunctions.crossTabulate') to compute the results.
Used in:
(Required) The input relation.
(Required) The name of the first column. Distinct items will make the first item of each row.
(Required) The name of the second column. Distinct items will make the column names of the DataFrame.
Computes basic statistics for numeric and string columns, including count, mean, stddev, min, and max. If no columns are given, this function computes statistics for all numerical or string columns.
Used in:
(Required) The input relation.
(Optional) Columns to compute statistics on.
Finding frequent items for columns, possibly with false positives. It will invoke 'Dataset.stat.freqItems' (same as 'StatFunctions.freqItems') to compute the results.
Used in:
(Required) The input relation.
(Required) The names of the columns to search frequent items in.
(Optional) The minimum frequency for an item to be considered `frequent`. Should be greater than 1e-4.
Returns a stratified sample without replacement based on the fraction given on each stratum. It will invoke 'Dataset.stat.freqItems' (same as 'StatFunctions.freqItems') to compute the results.
Used in:
(Required) The input relation.
(Required) The column that defines strata.
(Required) Sampling fraction for each stratum. If a stratum is not specified, we treat its fraction as zero.
(Required) The random seed. This field is required to avoid generating mutable dataframes (see SPARK-48184 for details), however, still keep it 'optional' here for backward compatibility.
Used in:
(Required) The stratum.
(Required) The fraction value. Must be in [0, 1].
Computes specified statistics for numeric and string columns. It will invoke 'Dataset.summary' (same as 'StatFunctions.summary') to compute the results.
Used in:
(Required) The input relation.
(Optional) Statistics from to be computed. Available statistics are: count mean stddev min max arbitrary approximate percentiles specified as a percentage (e.g. 75%) count_distinct approx_count_distinct If no statistics are given, this function computes 'count', 'mean', 'stddev', 'min', 'approximate quartiles' (percentiles at 25%, 50%, and 75%), and 'max'.
StorageLevel for persisting Datasets/Tables.
Used in:
, ,(Required) Whether the cache should use disk or not.
(Required) Whether the cache should use memory or not.
(Required) Whether the cache should use off-heap or not.
(Required) Whether the cached data is deserialized or not.
(Required) The number of replicas.
Used in:
Commands for a streaming query.
Used in:
(Required) Query instance. See `StreamingQueryInstanceId`.
See documentation for the corresponding API method in StreamingQuery.
status() API.
lastProgress() API.
recentProgress() API.
stop() API. Stops the query.
processAllAvailable() API. Waits till all the available data is processed
explain() API. Returns logical and physical plans.
exception() API. Returns the exception in the query if any.
awaitTermination() API. Waits for the termination of the query.
Used in:
Used in:
TODO: Consider reusing Explain from AnalyzePlanRequest message. We can not do this right now since it base.proto imports this file.
Response for commands on a streaming query.
Used in:
(Required) Query instance id. See `StreamingQueryInstanceId`.
Used in:
Used in:
(Optional) Exception message as string, maps to the return value of original StreamingQueryException's toString method
(Optional) Exception error class as string
(Optional) Exception stack trace as string
Used in:
Logical and physical plans as string
Used in:
Progress reports as an array of json strings.
Used in:
See documentation for these Scala 'StreamingQueryStatus' struct
The enum used for client side streaming query listener event There is no QueryStartedEvent defined here, it is added as a field in WriteStreamOperationStartResult
Used in:
A tuple that uniquely identifies an instance of streaming query run. It consists of `id` that persists across the streaming runs and `run_id` that changes between each run of the streaming query that resumes from the checkpoint.
Used in:
, , ,(Required) The unique id of this query that persists across restarts from checkpoint data. That is, this id is generated when a query is started for the first time, and will be the same every time it is restarted from checkpoint data.
(Required) The unique id of this run of the query. That is, every start/restart of a query will generate a unique run_id. Therefore, every time a query is restarted from checkpoint, it will have the same `id` but different `run_id`s.
The protocol for client-side StreamingQueryListener. This command will only be set when either the first listener is added to the client, or the last listener is removed from the client. The add_listener_bus_listener command will only be set true in the first case. The remove_listener_bus_listener command will only be set true in the second case.
Used in:
The protocol for the returned events in the long-running response channel.
Used in:
(Required) The json serialized event, all StreamingQueryListener events have a json method
(Required) Query event type used by client to decide how to deserialize the event_json
Used in:
Commands for the streaming query manager.
Used in:
See documentation for the corresponding API method in StreamingQueryManager.
active() API, returns a list of active queries.
get() API, returns the StreamingQuery identified by id.
awaitAnyTermination() API, wait until any query terminates or timeout.
resetTerminated() API.
addListener API.
removeListener API.
listListeners() API, returns a list of streaming query listeners.
Used in:
(Optional) The waiting time in milliseconds to wait for any query to terminate.
Used in:
Response for commands on the streaming query manager.
Used in:
Used in:
Used in:
Used in:
(Required) Reference IDs of listener instances.
Used in:
,(Required) The id and runId of this query.
(Optional) The name of this query.
Relation alias.
Used in:
(Required) The input relation of SubqueryAlias.
(Required) The alias.
(Optional) Qualifier of the alias.
See `spark.catalog.tableExists`
Used in:
(Required)
(Optional)
Relation of type [[Tail]] that is used to fetch `limit` rows from the last of the input relation.
Used in:
(Required) Input relation for an Tail.
(Required) the limit.
A task resource request.
Used in:
(Required) resource name.
(Required) resource amount requesting as a double to support fractional resource requests.
Rename columns on the input relation by the same length of names.
Used in:
(Required) The input relation of RenameColumnsBySameLengthNames.
(Required) The number of columns of the input relation must be equal to the length of this field. If this is not true, an exception will be returned.
Used in:
(Required) The input relation.
(Required) The user provided schema. The Sever side will update the dataframe with this schema.
Transpose a DataFrame, switching rows to columns. Transforms the DataFrame such that the values in the specified index column become the new columns of the DataFrame.
Used in:
(Required) The input relation.
(Optional) A list of columns that will be treated as the indices. Only single column is supported now.
Used in:
(Required) The aggregate function object packed into bytes.
See `spark.catalog.uncacheTable`
Used in:
(Required)
Used for testing purposes only.
Used in:
(message has no fields)
Unpivot a DataFrame from wide format to long format, optionally leaving identifier columns set.
Used in:
(Required) The input relation.
(Required) Id columns.
(Optional) Value columns to unpivot.
(Required) Name of the variable column.
(Required) Name of the value column.
Used in:
User Context is used to refer to one particular user session that is executing queries in the backend.
Used in:
, , , , , , , , ,To extend the existing user context message that is used to identify incoming requests, Spark Connect leverages the Any protobuf type that can be used to inject arbitrary other messages into this message. Extensions are stored as a `repeated` type to be able to handle multiple active extensions.
Adding columns or replacing the existing columns that have the same names.
Used in:
(Required) The input relation.
(Required) Given a column name, apply the corresponding expression on the column. If column name exists in the input relation, then replace the column. If the column name does not exist in the input relation, then adds it as a new column. Only one name part is expected from each Expression.Alias. An exception is thrown when duplicated names are present in the mapping.
Rename columns on the input relation by a map with name to name mapping.
Used in:
(Required) The input relation.
(Optional) Renaming column names of input relation from A to B where A is the map key and B is the map value. This is a no-op if schema doesn't contain any A. It does not require that all input relation column names to present as keys. duplicated B are not allowed.
Used in:
(Required) The existing column name.
(Required) The new column name.
Relation of type [[WithRelations]]. This relation contains a root plan, and one or more references that are used by the root plan. There are two ways of referencing a relation, by name (through a subquery alias), or by plan_id (using RelationCommon.plan_id). This relation can be used to implement CTEs, describe DAGs, or to reduce tree depth.
Used in:
(Required) Plan at the root of the query tree. This plan is expected to contain one or more references. Those references get expanded later on by the engine.
(Required) Plans referenced by the root plan. Relations in this list are also allowed to contain references to other relations in this list, as long they do not form cycles.
Used in:
(Required) The input relation
(Required) Name of the column containing event time.
(Required)
As writes are not directly handled during analysis and planning, they are modeled as commands.
Used in:
(Required) The output of the `input` relation will be persisted according to the options.
(Optional) Format value according to the Spark documentation. Examples are: text, parquet, delta.
(Optional) The destination of the write operation can be either a path or a table. If the destination is neither a path nor a table, such as jdbc and noop, the `save_type` should not be set.
(Required) the save mode.
(Optional) List of columns to sort the output by.
(Optional) List of columns for partitioning.
(Optional) Bucketing specification. Bucketing must set the number of buckets and the columns to bucket by.
(Optional) A list of configuration options.
(Optional) Columns used for clustering the table.
Used in:
Used in:
Used in:
(Required) The table name.
(Required) The method to be called to write to the table.
Used in:
As writes are not directly handled during analysis and planning, they are modeled as commands.
Used in:
(Required) The output of the `input` relation will be persisted according to the options.
(Required) The destination of the write operation must be either a path or a table.
(Optional) A provider for the underlying output data source. Spark's default catalog supports "parquet", "json", etc.
(Optional) List of columns for partitioning for output table created by `create`, `createOrReplace`, or `replace`
(Optional) A list of configuration options.
(Optional) A list of table properties.
(Required) Write mode.
(Optional) A condition for overwrite saving mode
(Optional) Columns used for clustering the table.
Used in:
Starts write stream operation as streaming query. Query ID and Run ID of the streaming query are returned.
Used in:
(Required) The output of the `input` streaming relation will be written.
The destination is optional. When set, it can be a path or a table name.
(Optional) Columns used for clustering the table.
Used in:
(Required) Query instance. See `StreamingQueryInstanceId`.
An optional query name.
Optional query started event if there is any listener registered on the client side.