Get desktop application:
View/edit binary Protocol Buffers messages
Error status returned by any RPC method. Every RPC method which could generate an application-level error should have this (or a more complex error result) as an optional field in its response. This maps to yb::Status in C++.
Used in:
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,For backward compatibility error_codes would also be filled. But errors would have priority if both are set.
Stores encoded error codes of Status. See Status for details.
Used in:
Some optional/additional fields that is collected in addition to the wait statement Depending on the wait-state we may choose to persist at most one of these into the yb_active_universe_history.wait_event_aux field.
Used in:
Used in:
, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,The root level request id. i.e. the request id corresponding to the call from the application to YSQL-Pg/YCQL if set, should be 16 bytes in length.
The client which sent the request to YSQL-Pg/YCQL.
The address family of the client host
The uuid of the local tserver which received the request from the client. i.e. the local tserver running the YSQL-Pg backend/YCQL proxy. if set, should be 16 bytes in length.
The request id for the "current" rpc. This may be different than the root level request id. e.g. consider a write-rpc sent by the local TServer to the destination TServer.
Query id as seen on pg_stat_statements to identify identical normalized queries.
pid of the YSQL/YCQL backend which is executing the query. TServer pid for background activities
PG database OID. This will be 0 for YCQL.
Used in:
, , ,Physical time at which the new config should be applied.
Used in:
An element found in a container metadata file of the log-backed block storage implementation. Each one tracks the existence (creation) or non-existence (deletion) of a particular block. They are written sequentially, with subsequent messages taking precedence over earlier ones (e.g. "CREATE foo" followed by "DELETE foo" means that block foo does not exist).
The unique identifier of the block.
Whether this is a CREATE or a DELETE.
The time at which this block record was created, expressed in terms of microseconds since the epoch.
The offset of the block in the container data file. Required for CREATE.
The length of the block in the container data file. Required for CREATE.
The kind of record.
Used in:
Used in:
CDC SDK Consistent Snapshot Options
Used in:
, , ,Client does not wish to consume snapshot
Client wishes to consume snapshot from the source universe
Client wishes to export snapshot to be used in other session
This message should be used for specifying stream creation options for CDCSDK streams.
Used in:
,For clarification of field meaning see comments of appropriate fields in YBTransaction::Impl
Used in:
For clarification of field meaning see comments of appropriate fields in YBTransaction::Impl
Used in:
Used in:
,Used in:
, , , , , , , , , ,Used in:
Note: an absent value means NULL.
Colocated YSQL user tables use 4-byte ID.
Colocated YSQL system tables use 16-byte UUID.
TODO: Differentiate between the schema attributes that are only relevant to the server (e.g., encoding and compression) and those that also matter to the client.
Used in:
, ,Added pg_type_oid for decoding the column value for CDC in 2.13. If the cluster has been upgraded, the value will be default and not the expected value.
The JSON attribute was mistakenly placed here, but it is not used. To be safe, it's left here for now. JSON attribute (for c->'a'->>'b' case).
If set, it means that this column is in process of being dropped by an ongoing DDL transaction. If that DDL transaction commits, this column will be dropped, otherwise this field will be unset again.
Supplemental protobuf container header, after the main header (see pb_util.h for details).
The protobuf schema for the messages expected in this container. This schema is complete, that is, it includes all of its dependencies (i.e. other schemas defined in .proto files imported by this schema's .proto file).
The PB message type expected in each data entry in this container. Must be fully qualified (i.e. yb.tablet.RaftGroupReplicaSuperBlockPB).
Used in:
We send the datum as QLValuePB if the client is Walsender. The Walsender does the conversion from QLValuePB to the PG datum.
This is applicable only for YSQL table columns.
Used in:
For delete, scan_type is currently either primary key lookup or range scan.
Used in:
,Required. Column Id.
Required. Hybrid Time when column was deleted on this tablet.
See ql_rocksdb_storage.cc for details on each method.
Used in:
TODO(get_table_key_ranges): deprecate after/if separate GetTabletKeyRanges RPC to the tablet leader is used.
Used in:
Captures the state of an Histogram.
Used in:
Used in:
, , , , , , , , , , , , , , , , , , , , , , , ,This message contains the metadata of a secondary index of a table. It maps the index::columns to the expressions of table::columns. Notes on where to find metadata of an INDEX. - Each INDEX is represented by IndexInfo and IndexTable. IndexInfoPB contains the definitions of the INDEX. IndexTable contains duplicate user-data for quick access. - The IndexInfoPB is stored in the Catalog::Table being indexed. - The TablePropertiesPB is kept in Catalog::IndexTable. Internally, Catalog::IndexTable is just the same as any Catalog::Table.
Used in:
, , , , , , , , , , ,Index table id.
Indexed table id.
Index table's schema version.
Whether the index is a local index
Whether the index is a unique index
We should only have this in the elements of "repeated IndexInfoPB indexes" of the SysTablesEntryPB of the main table.
Indexed and covering columns.
Number of hash columns in the index.
Number of range columns in the index.
Hash column ids in the indexed table.
Range column ids in the indexed table.
The mangled-name flag is kept on both IndexInfo and IndexTable as the same mangled-name is used in both IndexInfo and IndexTable columns.
Newer index has mangled name.
When backfill fails with error, the first message that it failed with.
Should backfill be deferred.
Where clause for partial index.
When the backfill job on this index is cleared, this field is set to the number of indexed table rows processed by the backfill job. The (default) value is zero, otherwise. For partial indexes, this includes non-matching rows of the indexed table.
Index column mapping. "colexpr" is used to compute the value of this column in an INDEX. - When a table is indexed by expressions, we create internal/hidden columns to store the index value, and "value_expr" specifies the indexing expression. - As of 07/2019, only QLJsonColumnOperationsPB is allowed for "colexpr". - In the current index design & implementation, expression can only reference ONE column. Example: Example for scalar index TABLE (a, b, c) INDEX (c) -> INDEX is a table whose column 'c' is referencing TABLE(c) colexpr = ref to "c" column_id. Example for JSON index TABLE (a, b, j) INDEX (j->>'b') -> INDEX is a table whose column 'j->>b' is referencing to TABLE(j) colexpr = j->'b'
Used in:
Column id in the index table.
Generated column name in the index table.
Corresponding column id in indexed table.
Column value in INDEX.
Used in:
Columns referenced in the where_expr
Indexes when created or deleted go through a series of steps to add a) delete, b) write and c) read permissions one by one. Backfill is done before the READ_WRITE_AND_DELETE state. If the backfill succeeds, the read permission is granted. If not, the given permissions are removed one by one, and the index will be deleted when it is unused. If the backfill is successfully complete, the index will be in READ_WRITE_AND_DELETE state, and remain so until the user deletes it. If an Index is dropped, it will move from READ_WRITE_AND_DELETE state throu the WRITE_AND_DELETE_WHILE_REMOVING and DELETE_ONLY_WHILE_REMOVING states to INDEX_UNUSED.
Used in:
This is the "success" state, where the index is completely usable.
Used while removing an index -- either due to backfill failure, or due to a client requested "drop index".
Used as a sentinel value.
Used in:
When any server initializes a new filesystem (eg a new node is created in the cluster), it creates this structure and stores it persistently.
The UUID which is assigned when the instance is first formatted. This uniquely identifies the node in the cluster.
Human-readable string indicating when and where the node was first initialized.
For all newly created clusters after this field is set, initialize this variable to true. For existing clusters this will be false. This will be used on the sys catalog restore path to indicate whether we want to run initdb.
Tserver specific metadata.
Used in:
Used in:
Used in:
The -> operator applied to a column.
The ->> operator applied to a column.
Used in:
,TODO(pglocks): Omit prefix locks, which are implied to exist if their fully-qualified forms are present.
If this is set, then the lock has been granted.
Populated only when the lock belongs to a colocated table. When set, TabletLockInfoPB::pg_table_id should be ignored.
Used in:
Uniquely identify a particular instance of a particular server in the cluster.
Used in:
, , , , , , , , ,Unique ID which is created when the server is first started up. This is stored persistently on disk.
Sequence number incremented on every start-up of the server. This makes it easy to detect when an instance has restarted (and thus can be assumed to have forgotten any soft state it had in memory). On a freshly initialized server, the first sequence number should be 0.
The time point when tserver/master starts up, it is derived from Env::NowMicros()
An id for a generic state machine operation. Composed of the leaders' term plus the index of the operation in that term, e.g., the <index>th operation of the <term>th leader.
Used in:
, , , , , , , , , , , , , , , , , , , , , , ,The term of an operation or the leader's sequence id.
The possible order modes for clients. Clients specify these in new scan requests. Ordered scans are fault-tolerant, and can be retried elsewhere in the case of tablet server failure. However, ordered scans impose additional overhead since the tablet server needs to sort the result rows.
This is the default order mode.
The serialized format of a YB table partition.
Used in:
, , , , , ,The hash buckets of the partition. The number of hash buckets must match the number of hash bucket components in the partition's schema.
The encoded start partition key (inclusive).
The encoded end partition key (exclusive).
The serialized format of a YB table partition schema.
Used in:
, , , , , , ,A column identifier for partition schemas. In general, the name will be used when a client creates the table since column IDs are assigned by the master. All other uses of partition schemas will use the numeric column ID.
Used in:
,Used in:
Column identifiers of columns included in the hash. Every column must be a component of the primary key.
Number of buckets into which columns will be hashed. Must be at least 2.
Seed value for hash calculation. Administrators may set a seed value on a per-table basis in order to randomize the mapping of rows to buckets. Setting a seed provides some amount of protection against denial of service attacks when the hash bucket columns contain user provided input.
The hash algorithm to use for calculating the hash bucket.
Used in:
Used in:
Used in:
Column identifiers of columns included in the range. All columns must be a component of the primary key.
Split points for range tables.
Field "split_rows" is only available in beta version for SPLIT AT feature. This feature had not been used, so the release version for SPLIT AT does not need to be compatible with the beta version in the past releases.
Used in:
Each range column will be assigned a lower bound.
A filesystem instance can contain multiple paths. One of these structures is persisted in each path when the filesystem instance is created.
Describes this path instance as well as all of the other path instances that, taken together, describe a complete set.
Textual representation of the block manager that formatted this path.
Block size on the filesystem where this instance was created. If the instance (and its data) are ever copied to another location, the block size in that location must be the same.
Describes a collection of filesystem path instances and the membership of a particular instance in the collection. In a healthy filesystem (see below), a path instance can be referred to via its UUID's position in all_uuids instead of via the UUID itself. This is useful when there are many such references, as the position is much shorter than the UUID.
Used in:
Globally unique identifier for this path instance.
All UUIDs in this path instance set. In a healthy filesystem: 1. There exists an on-disk PathInstanceMetadataPB for each listed UUID, and 2. Every PathSetPB contains an identical copy of all_uuids.
The possible roles for peers.
Used in:
, , , ,Indicates this node is a follower in the configuration, i.e. that it participates in majorities and accepts Consensus::Update() calls.
Indicates this node is the current leader of the configuration, i.e. that it participates in majorities and accepts Consensus::Append() calls.
New peers joining a quorum will be in this role for both PRE_VOTER and PRE_OBSERVER while the tablet data is being remote bootstrapped. The peer does not participate in starting elections or majorities.
Indicates that this node is not a participant of the configuration, i.e. does not accept Consensus::Update() or Consensus::Update() and cannot participate in elections or majorities. This is usually the role of a node that leaves the configuration.
This peer is a read (async) replica and gets informed of the quorum write activity and provides time-line consistent reads.
Suffixed with PERMISSION, because Google does not allow same enum name CREATE
Used in:
,To ensure compatibility between release versions, the numeric values of these datatypes cannot be changed once the types are implemented and released. Make sure this is in sync with YbcPgDataType in ybc_pg_typedefs.h.
Used in:
,TUPLE is not yet fully implemented, but it is a CQL type.
All unsigned datatypes will be removed from QL because databases do not have these types.
Used in:
Used in:
Used in:
Available replica identity modes for use in CDC
Used in:
, ,Entire updated row as new image, only key as old image for DELETE The name DEFAULT is taken from PG, however it is not the default replica identity mode
Both old and new images of the entire row
No old image for any operation
Only the changed columns as new image, no old image except DELETE This is the default replica identity mode in YSQL
Used in:
, ,Used in:
Used in:
,Permanent index id. It should be unique for tablet, but when tablet is cloned, index id in new tablet remains the same.
Used in:
Used in:
Used in:
-------------------------------------------------------------------------------------------------- Expressions. -------------------------------------------------------------------------------------------------- Builtin call expression. There are 3 different calls. - Builtin operators such as '>', '<', '=', ... These operators can be executed anywhere. - Builtin functions such as Now(). These functions can be executed anywhere. - Server builtin functions. Only tablet servers can execute these functions. TODO(neil) Regular builtin operators. This message can be executed anywhere. - This is more efficient than builtin call as it avoids most overheads of calling builtin lib. - Merge the current condition operator execution with this. - To optimize certain operation (such as +), replace it builtin function with builtin op.
Used in:
Given: master --chunk--> tserver --batch--> postgres --page--> tserver The backfill spec is updated and passed around for each batch and page.
[batch] Minimum number of table rows to backfill in this batch (see issue #10649). Stays constant for each batch and all of its pages. -1 means no limit.
[page] Number of table rows backfilled so far in this batch. Updated every page.
[batch] The next row key to read for a backfill batch. Defined/used similar to how it is used in paging state. All the intermediate pages of the batch have this unset. An example is easier to understand: - tserver --batch--> postgres: next_row_key = FOO - postgres --page--> tserver: next_row_key = FOO - postgres <--page-- tserver: next_row_key = unset; PgsqlPagingStatePB.next_row_key = BAR - postgres --page--> tserver: next_row_key = unset; PgsqlPagingStatePB.next_row_key = BAR - postgres <--page-- tserver: next_row_key = unset; PgsqlPagingStatePB.next_row_key = BAZ - postgres --page--> tserver: next_row_key = unset; PgsqlPagingStatePB.next_row_key = BAZ - postgres <--page-- tserver: next_row_key = QUX; PgsqlPagingStatePB.next_row_key = QUX - tserver <--batch-- postgres: next_row_key = QUX
This message defines an argument in a batch request from PgGate to DocDB. Instead of sending many requests of different arguments, a batch request would send one request that contains an array of independent arguments. DocDB will iterate the array to execute. DML_request(arg) [n] ---> DML_request ( PgsqlBatchArgument args[n] )
Used in:
,The order number of this argument in a batch. Currently, this is only used for debugging, but it might be needed in the future.
This attribute is used when fetching by rowid (ybctid). PgGate fetches "base_ctid" from SecondaryIndex and use it to select Targets from UserTable. SELECT <Targets> FROM <UserTable> WHERE ybctid IN (SELECT base_ybctid FROM <SecondaryIndex> WHERE <key_condition>)
Hash codes.
Partition values.
Range values.
Represents a column referenced by a PGSQL request. - column_id is a DocDB identifier of the column and used to locate a value within a row - attno is a Postgres identifier, must be specified if column is referenced by a PG expression - typid, typmod are needed to convert DocDB values to Postgres format (Datum and isnull pair). They must be specified if used in an expression evaluated by PGgate. - collid is for future collations support
Used in:
,-------------------------------------------------------------------------------------------------- Column messages. -------------------------------------------------------------------------------------------------- ColumnRefs is a list of columns to be read by DocDB before a PGSQL request can be executed. DEPRECATED, new code should use col_refs field of type PgsqlColRefPB instead
Used in:
,ColumnValue is a value to be assigned to a table column by DocDB while executing a PGSQL request. Currently, this is used for SET clause. SET column-of-given-id = expr
Used in:
A logical condition that evaluates to true/false. Used in the WHERE clause.
Used in:
An expression in a WHERE condition. - Bind values would be given by client and grouped into a repeated field that can be accessed by their indexes. - Alias values would be computed by server and grouped into repeated field that can be accessed by their indexes. - Code generator write indexes as ref. Executor deref indexes to get actual values.
Used in:
, , , , ,Bind variable index.
Alias index.
Regular builtin calls.
Tablet server builtin calls.
Builtin operator calls.
Logical condition that evaluates to true/false.
Tuple expression in LHS.
Used in:
Used in:
Client info
required
All locks if not specified
Lock or Unlock
Used in:
Make sure this is in sync with YbcPgMetricsCaptureType in ybc_pg_typedefs.h.
Used in:
Paging state for continuing a read request. For a SELECT statement that returns many rows, the client may specify how many rows to return at most in each fetch. This paging state maintains the state for returning the next set of rows of the statement. This paging state is opaque to the client. When there should be more rows to return from the same tablet in the next fetch, "next_row_key" is populated in DocDB (PgsqlReadOperation) with the DocKey of the next row to read. We also embed a hybrid-time which is the clean snapshot time for read consistency. We also populate the "next_partition_key" for the next row, which is the hash code of the hash portion of the DocKey. This next partition key is needed by YBClient (Batcher) to locate the tablet to send the request to and it doesn't have access to the DocDB function to decode and extract from the DocKey. When we are done returning rows from the current tablet and the next fetch should continue in the next tablet (possible only for full-table query across tablets), "next_partition_key" is populated by the current tablet with its exclusive partition-end key, which is the start key of next tablet's partition. "next_row_key" is empty in this case which means we will start from the very beginning of the next tablet. (TODO: we need to return the clean snapshot time in this case also).
Used in:
,Table UUID to verify the same table still exists when continuing in the next fetch.
Partition key to find the tablet server of the next row to read.
The row key (SubDocKey = [DocKey + HybridTimestamp]) of the next row to read.
Running total number of rows read across fetches so far. Needed to ensure we read up to the number of rows in the SELECT's LIMIT clause across fetches.
For selects with IN condition on the hash columns there are multiple partitions that need to be queried, one for each combination of allowed values for the hash columns. This holds the index of the next partition and is used to resume the read from the right place.
Used read time.
Paging state fields for geometric indexes.
Value of the next geometric index row that can be used as a key into the main RocksDB store.
Boundary value.
Used in:
Boundary partition key.
Indicate whether or not the boundary is inclusive.
PgsqlRSColDesc is the descriptor of a selected column in a ResultSet (RS), which can be any expression and not just table columns.
Used in:
Descriptor of a row in a resultset (RS).
Used in:
,YB_TODO(upgrade): Handle in pg15 upgrade path.
Used in:
TODO(neil) The protocol for select needs to be changed accordingly when we introduce and cache execution plan in tablet server.
Used in:
,Client info
required
Statement info. There's only SELECT, so we don't need different request type.
required
Table id.
required
Table schema version
required
Hash Code: A hash code is used to locate efficiently a group of data of the same hash code. - First, it is used to find tablet server. - Then, it is used again by DocDB to narrow the search within a tablet. - In general, hash_code should be set by PgGate, but for some reasons, when the field "ybctid_column_value" is used, hash_code is set by "common" lib.
Primary key. - Partition columns are used to compute the hash code, e.g. for h1 = 1 AND h2 = 2 partition_column_values would be [1, 2]. - Range columns combining with partition columns are used for indexing.
For select using local secondary index: this request selects the ybbasectids to fetch the rows in place of the primary key above.
Where clause condition DEPRECATED in favor of where_clauses
TODO(alex): This should be renamed to make it clear that it's ONLY related to range columns! Conditions for range columns.
If this attribute is present, this request is a batch request.
------------------------------------------------------------------------------------------------ Output Argument Specification (Tuple descriptor). For now, we sent rsrow descriptor from proxy to tablet server for every request. RSRow is just a selected row. We call it rsrow to distinguish a selected row from a row of a table in the database in our coding.
required.
------------------------------------------------------------------------------------------------ Database Arguments - To be read from the DataBase. Listing of all columns that this operation is referencing. TServers will need to read these columns when processing this read request. DEPRECATED in favor of col_refs;
------------------------------------------------------------------------------------------------ Query options.
Reading distinct columns?
Current only used on SELECT DISTINCT scan. If the value is greater than 0, use the specified prefix length to scan the table.
Flag for reading aggregate values.
Limit number of rows to return. For SELECT, this limit is the smaller of the page size (max (max number of rows to return per fetch) & the LIMIT clause if present in the SELECT statement.
------------------------------------------------------------------------------------------------ Paging state retrieved from the last response.
Return paging state when "limit" number of rows are returned? In case when "limit" is the page size, this is set for PgsqlResponsePB to return the paging state for the next fetch.
the upper limit for partition (hash) key when paging.
The version of the ysql system catalog this query was prepared against. Used when per-db catalog version mode is disabled.
The version of the ysql per-db system catalog this query was prepared against. Used when per-db catalog version mode is enabled.
The oid of the database of the Postgres session where this query was issued. Used when per-db catalog version mode is enabled.
Row mark as used by postgres for row locking.
Used only for explicit row-locking to identify 3 possibilities: WAIT_BLOCK, WAIT_SKIP, or WAIT_ERROR. See WaitPolicy for detailed information.
Scan partition boundary. NOTE - Boundaries indicate the scan range that are given by SQL statement. This should be set by PgGate when it processed query specification. - paging_state indicates the current position of a scan. This should be set ONLY by DocDB who scans the table. - Don't overload these fields for different purposes to avoid confusion.
Deprecated fields. Field "max_partition_key" is replaced by "lower_bound" and "upper_bound". This field is only available in beta version for SPLIT_AT feature. The release version for SPLIT AT does not need to be compatible with the beta version as SPLIT AT feature were not used.
Does this request correspond to scanning the indexed table for backfill?
If present, the request is to collect random sample
Boundaries of blocks to be sampled at 2nd stage for getting sample of rows, empty during 1st stage.
Instruction for BACKFILL operation from the tablet server. This is the serialized-string of "message PgsqlBackfillSpecPB" which will be used by the tablet server handling the PgsqlRead to terminate the scan and checkpoint the paging state as desired.
Pushed down Postgres conditions. Like in Postgres, conditions are implicitly ANDed. Per scanned row, expressions are evaluated by PGgate one by one, and if any is evaluated to false the row is skipped.
Columns referenced by this write operation. Provides mapping between DocDB and Postgres identifiers, and provides Postgres type information to convert between DocDB and Postgres formats. One entry per column referenced.
Used only in pg client.
What metric changes to return in response.
TODO(get_table_key_ranges): deprecate this field after separate separate GetTabletKeyRanges RPC to the tablet leader is used.
Used in:
Used in:
Used in:
Used in:
Gauge metrics changes as a result of processing the request.
Counter metrics changes as a result of processing the request.
Event stats metrics changes as a result of processing the request.
Count of table rows scanned as a result of processing the request.
Count of secondary index rows scanned as a result of processing the request.
Response from tablet server for both read and write.
Used in:
, ,Internal Status information
required
True if this operation was not actually applied, for instance update to the same value.
User readable error message associated with Internal & External Status DEPRECATED: please use error_status instead
Sidecar of rows data returned
Paging state for continuing the read in the next QLReadRequestPB fetch.
When client sends a request that has a batch of many arguments, server might process only a subset of the arguments. Attribute "batch_arg_count" is to indicate how many arguments have been processed by server. NOTE: This variable could have been inside "paging state", but due to rolling upgrade I have to place it here, separated from paging status for singular request.
For use with geometric indexes. The nth value here is the distance of the nth returned row from the query point/vector. The distance is encoded into binary representation and converted to int64. So if we have original_distance1 < original_distance2, then encoded_distance1 < encoded_distance2. We are widely use the similar approach for table primary key when it is converted to binary representation. Here we do the same with addendum that distance could be always encoded to 8 bytes or less.
The offset of end of encoded row for particular vector.
Number of rows affected by the operation. Currently only used for update and delete.
PostgreSQL error code encoded as in errcodes.h or yb_pg_errcodes.h. See https://www.postgresql.org/docs/11/errcodes-appendix.html DEPRECATED: please use error_status instead
Transaction error code, obtained by static_cast of TransactionErrorTag::Decode of Status::ErrorData(TransactionErrorTag::kCategory) DEPRECATED: please use error_status instead
Random sample state to hand over to the next block
Instruction for BACKFILL operation from master/table server. This is the serialized-string of "message PgsqlBackfillSpecPB". If 'is_backfill_batch_done' == true, Postgres will return this 'backfill_spec' to tablet server that sent out the 'BACKFILL' request statement.
Table's partition list version obtained from YBTable during response preparation. Table cache should be invalidated on partition list version mismatch detected.
Encapsulates data from a Status instance Repeated field provides support for multiple Statuses per response, allowing to return not only detailed error, but also warnings and notices.
Metrics changes as a result of processing the request.
Response status
Used in:
Used in:
Random sampling state
Used in:
,target number of rows to collect
number of rows collected so far across all blocks
number of rows considered for sample across all blocks
number of total rows estimated
current number of rows to skip
reservoir state W
128 bits of sampler random state.
number of blocks collected so far across all tablets
number of blocks considered so far for sample across all tablets
Whether caller expects only sample blocks boundaries as a result of this request processing.
Instruction.
Used in:
,Client info
required
Statement info
client request id.
required
Table id.
required
Table schema version.
required
------------------------------------------------------------------------------------------------ Row Identifiers provides information for the row that the server INSERTs, DELETEs, or UPDATEs. - hash_code, if provided, helps locating the row. It is used by the proxy to find tablet server and by DocDB to find a set of matching rows. - partition_column_values are for partition columns, e.g. for h1 = 1 AND h2 = 2 partition_column_values would be [1, 2]. - range_column_values are for range columns. - column_values are for non-primary columns. They are used to further identify the row to be inserted, deleted, or updated. NOTE: Primary columns are defined when creating tables. These columns are different from those columns that are used as part of an INDEX.
Not used with UPDATEs. Use column_new_values to UPDATE a value.
Inserted rows in packed format. Even indexes are used to encode keys, odd indexes are used to encode values. partition_column_values, range_column_values, ybctid_column_value are ignored if packed rows are specified.
If this attribute is present, this request is a batch request.
------------------------------------------------------------------------------------------------ Column New Values. - Columns to be overwritten (UPDATE SET clause). It shouldn't contain primary-key columns.
------------------------------------------------------------------------------------------------ Tuple descriptor for RETURNING clause. For now, we sent rsrow descriptor from proxy to tablet server for every request. RSRow is just a selected row. We call it rsrow to distinguish a selected row from a row of a table in the database in our coding.
required for a RETURNING clause.
------------------------------------------------------------------------------------------------ Where clause condition
Listing of all columns that this write operation is referencing. TServers will need to read these columns when processing the write request. DEPRECATED in favor of col_refs;
The version of the ysql system catalog this query was prepared against. Used when per-db catalog version mode is disabled.
The version of the ysql per-db system catalog this query was prepared against. Used when per-db catalog version mode is enabled.
The oid of the database of the Postgres session where this query was issued. Used when per-db catalog version mode is enabled.
True only if this changes a system catalog table (or index).
Does this request correspond to backfilling an index table?
For DELETE requests, do they need to be persisted in regular rocksdb? This is currently only used for DELETEs to the index when the index is getting created with index backfill enabled.
Columns refereced by this write operation. Provides mapping between DocDB and Postgres identifiers, and provides Postgres type information to convert between DocDB and Postgres formats. One entry per column referenced.
Used only in pg client.
During ysql major catalog upgrades, we block pg from updating the catalog data. Bypass this check when we know its safe. Valid cases are InitDB, pg_upgrade, and temp tables.
Statement types
Used in:
This represents one instance of a placement constraint for a table. It is used to dictate what is the minimum number of expected replicas in a certain cloud/region/zone combo.
Used in:
The cloud, region and zone information for this placement block.
The minimum number of replicas that should always be up in this placement.
This keeps track of the set of PlacementBlockPBs defining the placement requirements for a certain table. This is used both in the on-disk storage in SysCatalog, as well as in the actual table creation calls and in the schema returned to client queries. This is tightly coupled with the overall num_replicas for a certain table, as we want to both be able to specify requirements, per placement block, but also for the overall RF factor of the table.
Used in:
16 byte uuid
Used in:
Used in:
Ex: yb-master, yb-tserver.
flag_infos is only populated after version 2.20. AutoFlags that were promoted before this version will be backfilled with default values.
Arbitrary protobuf that has one PB dependency.
Used in:
Arbitrary protobuf has two PB dependencies. dependency.
Arbitrary protobuf to test writing a containerized protobuf.
Used in:
,Builtin call expression. There are 3 different calls. - Builtin operators such as '>', '<', '=', ... These operators can be executed anywhere. - Builtin functions such as Now(). These functions can be executed anywhere. - Server builtin functions. Only tablet servers can execute these functions. TODO(neil) Regular builtin operators. This message can be executed anywhere. - This is more efficient than builtin call as it avoids most overheads of calling builtin lib. - Merge the current condition operator execution with this. - To optimize certain operation (such as +), replace it builtin function with builtin op.
Used in:
Client type.
Used in:
, , , ,A column value, optionally with subscripts, e.g. m['x'] or l[2]['x']
Used in:
A logical condition that evaluates to true/false. Used in the WHERE clause.
Used in:
An expression in a WHERE condition
Used in:
, , , , , , , , , ,Bind variable index.
This should be replaced with builtin operator.
Regular builtin calls.
Tablet server builtin calls.
Builtin operator calls.
Json column operators.
Tuple expression in LHS.
Represents operations applied to a json column.
Used in:
Used in:
, ,Used in:
Expression operators.
Used in:
,Logic operators that take one operand.
Logic operators that take two or more operands.
Relation operators that take one operand.
Relation operators that take two operands.
Relation operators that take three operands.
Operators that take no operand. For use in "if" clause only currently.
IF EXISTS
IF NOT EXISTS
Used in:
,Table UUID to verify the same table still exists when continuing in the next fetch.
Partition key to find the tablet server of the next row to read.
The row key (SubDocKey = [DocKey + HybridTimestamp]) of the next row to read.
Running total number of rows read across fetches so far. Needed to ensure we read up to the number of rows in the SELECT's LIMIT clause across fetches.
For selects with IN condition on the hash columns there are multiple partitions that need to be queried, one for each combination of allowed values for the hash columns. This holds the index of the next partition and is used to resume the read from the right place.
The number of valid rows that we've skipped so far. This is needed to properly implement SELECT's OFFSET clause.
Request id of the first request, that started paging.
Used read time.
SELECT row counters. NOTE: - The select-row-counter is always applied to the entire SELECT statement. - The rest of the counters might be applied to just one batch. When indexed query returns data in batches, and these counters would be for just one specific batch. Example on nested query. SELECT <data> FROM <table> WHERE <keys> IN (SELECT <keys> FROM <index> WHERE hash IN <set>) - The <row_counter> is used for query "SELECT <data>". - The rest of the counters are for nested query "SELECT <keys>".
Schema version: - A query is prepared for a specific schema version. - A paging state of a query execution is for a specific schema version. - When schema version changes, we can either continue or raise error for the execution. This is controlled by the g-flag 'cql_check_table_schema_in_paging_state'. The flag is 'true' by default. It means: raise error in the case to prevent data reinterpretation on the driver side. Example: - Prepare queryX = "SELECT a, b FROM t" - Execute queryX - get page1. - Alter t column c; -> This alter shouldn't affect the query on columns a & b. - Execute queryX (continue) -> Should this be an error? Yes, raise error if '--cql_check_table_schema_in_paging_state=true'.
QLRSColDesc is the descriptor of a SELECT'ed column in a ResultSet, which can be any expression and not just table columns.
Used in:
Descriptor of a row in a resultset.
Used in:
TODO(neil) The protocol for select needs to be changed accordingly when we introduce and cache execution plan in tablet server.
Used in:
Client info
required
client request id - for debug tracing purpose only
Table schema version
required
Hashed key of row(s) to read - all fields required. The hashed column values must be in the same order as the column order in the table schema. If only a subset of hash columns are specified in the WHERE clause of the SELECT statement, "hashed_column_values" will be empty and we will do a full-table query across tablets.
Where clause condition
If clause condition
TODO(neil) Currently, we need only the datatypes of the rsrow descriptor. However, when we optimize our execution pipeline to bypass QL layer, we might need to send the name as part of the prepared protobuf so that server know how to form result set without the help from QL. For now, we sent rsrow descriptor from proxy to tablet server for every request. RSRow is just a selected row. We call it rsrow to distinguish a selected row from a row of a table in the database in our coding.
required.
required.
Reading distinct columns?
Current only used on SELECT DISTINCT scan. If the value is greater than 0, use the specified prefix length to scan the table.
Limit number of rows to return. For QL SELECT, this limit is the smaller of the page size (max (max number of rows to return per fetch) & the LIMIT clause if present in the SELECT statement.
The offset from which we should start returning rows. This is primarily used to support the QL OFFSET clause.
Paging state retrieved from the last response.
Return paging state when "limit" number of rows are returned? In case when "limit" is the page size, this is set for QLResponsePB to return the paging state for the next fetch.
The remote endpoint sending this request. This is filled in by the server and should not be set.
If this request comes from proxy, it should be filled with UUID of this proxy.
the upper limit for partition (hash) key scan ranges (inclusive)
Listing of all columns that this operation is referencing. TServers will need to read these columns when processing this read request.
Id used to track different queries.
Flag for reading aggregate values.
These columns must be read by DocDB before a read or write request can be executed.
Used in:
,Used in:
,Status and error message
required
Schema of the rows returned if present (used by conditional DML (write) request only as of Jan 2017).
Sidecar of rows data returned
Paging state for continuing the read in the next QLReadRequestPB fetch.
Result of child transaction.
For conditional DML: indicate if the DML is applied or not according to the conditions.
Response status
Used in:
Paging state for continuing a read request. For a SELECT statement that returns many rows, the client may specify how many rows to return at most in each fetch. This paging state maintains the state for returning the next set of rows of the statement. This paging state is opaque to the client. When there should be more rows to return from the same tablet in the next fetch, "next_row_key" is populated in DocDB (QLReadOperation) with the DocKey of the next row to read. We also embed a hybrid-time which is the clean snapshot time for read consistency. We also populate the "next_partition_key" for the next row, which is the hash code of the hash portion of the DocKey. This next partition key is needed by YBClient (Batcher) to locate the tablet to send the request to and it doesn't have access to the DocDB function to decode and extract from the DocKey. When we are done returning rows from the current tablet and the next fetch should continue in the next tablet (possible only for full-table query across tablets), "next_partition_key" is populated by the current tablet with its exclusive partition-end key, which is the start key of next tablet's partition. "next_row_key" is empty in this case which means we will start from the very beginning of the next tablet. (TODO: we need to return the clean snapshot time in this case also). RowCounter message. - These state variables are exchanged between users and CQL server when query result is paged. - DocDB does not use these.
Used in:
The following counters represent LIMIT and OFFSET clause in SELECT and their status. SELECT ... FROM table LIMIT xxx OFFSET yyy; - When a field value == -1, that field is not used. - When a field value >= 0, it is valid. - Example: If "select_limit == 0", users are querying zero rows.
Sequence of values used to represent Lists, Sets and Tuples
Used in:
Reference to a subcolumn, e.g. m['x'] or l[2]['x']
Used in:
Tuple expression in LHS of IN operation (in future for =, >, < too).
Used in:
,Used in:
, , , , , , ,Fields for user-defined types
Used in:
A QL value
Used in:
, , , , , , , , ,Note: an absent value means NULL
Note: min int size in protobuf is int32
Sometimes YB sends non-UTF8 string in string field. For example, this can happen with a collated string: under "en_US.utf8" collation, string 'abc' is encoded as string_value: "\000\201\014\r\016\001\t\t\t\001\t\t\t\000abc" which is not a valid UTF8 string. That is why 'bytes' is used for string_value field.
raw bytes for inet address in network byte order.
raw bytes for uuid value.
raw bytes for timeuuid value.
Number of days where 2^31 corresponds to 1970-01-01 (see DateTime::DateFromString)
Number of nano-seconds, from 0 to (24 * 60 * 60 * 1,000,000,000 - 1) (see DateTime::kMaxTime)
Represent various system values.
Null category for GIN indexes.
raw bytes for bson value.
-------------------------------------------------------------------------------------------------- YQL support. The following section is common for all query language. -------------------------------------------------------------------------------------------------- Represent system internal values.
Used in:
Used in:
Statement type
required
Client info
required
client request id - for debug tracing purpose only
Table schema version
required
Primary key of the row to insert/update/delete - all fields required. The hashed and range column values must be in the same order as the column order in the table schema. Note: the hash_code is the hash of the hashed_column_values. Technically, this can be recomputed by tserver also, but since the client already calculates this to look up the correct tablet server, it is passed in.
Column values to insert/update/delete - required Note: DELETE statement has no column value.
Where clause condition -- currently this is only allowed for deletes.
If clause condition
Else error clause
Time to live in milliseconds.
Listing of all columns that this write operation is referencing. TServers will need to read these columns when processing the write request.
Id used to track different queries.
User-provided timestamp in microseconds.
Ids of indexes that need update.
Child transaction data for additional write requests necessary for index updates, etc.
Whether to return a status row reporting applied status and execution errors (if any).
Does this request correspond to backfilling an index table?
Statement types
Used in:
See ReadHybridTime for explation of this message.
Used in:
, , , , , , , , , ,When a transaction reads data, all committed writes of other transactions with a timestamp <= read_ht are visible. However, a transaction's reads should be able to see all provisional writes from the same transaction irrespective of read_ht, except some based on additional constraints for YSQL (as described in src/yb/yql/pggate/README). The query layer uses in_txn_limit_ht to meet these constraints. If in_txn_limit_ht is set, any operation will not be able to read provisional writes from the same transaction which have a write timestamp higher than the in_txn_limit_ht. src/yb/yql/pggate/README also explains how this works along with YSQL's buffering logic.
SADD
Used in:
Following options are for ZADD only.
Modify the return value from the number of new elements added, to the total number of elements changed.
APPEND
Used in:
(message has no fields)
Used in:
Used in:
Used only with ZRANGEBYSCORE, ZREVRANGE.
Used in:
Used in:
,DEL, HDEL
Used in:
(message has no fields)
EXISTS, HEXISTS
Used in:
(message has no fields)
RENAME
Used in:
(message has no fields)
GETRANGE
Used in:
Required
Required
GET, HGET, MGET, HMGET, HGETALL, SMEMBERS HKEYS, HKEYS, HLEN
Used in:
Used in:
GETSET
Used in:
(message has no fields)
TTL, PTTL
Used in:
By default, the return value is in milliseconds.
INCR, INCRBY, INCRBYFLOAT, HINCRBY, HINCRBYFLOAT, ZINCRBY, DECR, DECRBY, ZADD with incr option
Used in:
Used in:
Used in:
LINSERT
Used in:
Required
- Even if just a key is needed, or there are multiple values, this is used. - In case of referring to an entity within a container, the type of the outer_key and the subkey or index of the inner entity is specified. - String : Set the key and value only (setting STRING type is optional as it is understood). - List : Set the key, index, and value. - Set : Set the key, and value (possibly multiple depending on the command). - Hash : Set key, subkey, value. - SortedSet : Set key, subkey, value (value is interpreted as score). - Timeseries: Set key, subkey, value (timestamp_subkey in RedisKeyValueSubKeyPB is interpreted as timestamp). - Value is not present in case of an append, get, exists, etc. For multiple inserts into a container, the subkey and value fields have one or more items.
Used in:
,Note: the hash_code is the hash of the 'key' below. Technically, this can be recomputed by tserver also, but since the client already calculates this to look up the correct tablet server, it is passed in.
Required
It is assumed that all subkeys are distinct for HMSET and SADD. For collection range requests, we would have exactly two subkeys denoting the lower bound and upper bound for the range request.
Used in:
,Timestamp used in the redis timeseries datatype.
Double used in redis sorted set datatype.
KEYS
Used in:
No operation.
Used in:
(message has no fields)
This enum is used to specify the insertion position (Insert after or before index).
Used in:
LPOP, RPOP, SPOP; blocking versions BLPOP etc. currently not supported.
Used in:
Count is allowed only when popping from a set.
RPUSH, RPUSHX, LPUSH, LPUSHX
Used in:
Used in:
The maximum number of entries to retrieve for a range request.
Used in:
,Used in:
Nil is when value is not found, but isn't an error case, eg. GET key when key has no value.
Not found is an error when an existing value is needed for the command, eg. RENAME.
This code is set in the client to mark commands for which no response was received. This happens when the batch RPC fails somewhere and the individual requests don't get processed.
SETRANGE
Used in:
Required
SET, SETNX, SETXX, HSET, HSETNX, LSET, MSET, HMSET, MSETNX
Used in:
Expiration time in milliseconds.
PERSIST, (P)EXPIRE, (P)EXPIREAT
Used in:
Expiration time in milliseconds.
For (P)EXPIRE
For (P)EXPIREAT
This enum is used to specify the side of a list (For LPOP or RPOP etc.).
Used in:
,STRLEN, HSTRLEN
Used in:
(message has no fields)
Wrapper for a subkey which denotes an upper/lower bound for a range request.
Used in:
Used in:
Used in:
Used in:
,A single Redis request. Some user commands like MGET should be split into multiple of these.
Used in:
Only one of the following fields should be set.
Used in:
,One or more WAL segments are missing an expected op id.
There is a schema mismatch between the producer and consumer tables.
There is a colocated table missing on the consumer side.
Replication state has not yet been initialized.
Replication is healthy.
The AutoFlags config has changed and the new version is not compatible, or has not yet been validated.
Error connecting to the source universe.
There was a generic system error.
The replication stream has been paused by the user.
Higher level structure to keep track of all types of replicas configured. This will have, at a minimum, the information about the replicas that are supposed to be active members of the raft configs, but can also include extra information, such as read only replicas.
Used in:
, , , , , , , ,The type of LSN type which will be used with the replication slot and stream metadata.
Used in:
, ,The type of ordering mode to be used while using logical replication.
Used in:
, ,Used for Cassandra Roles and Permissions
Used in:
,This enum matches enum RowMarkType defined in src/include/nodes/plannodes.h. The exception is ROW_MARK_ABSENT, which signifies the absence of a row mark.
Used in:
,Obtain exclusive tuple lock.
Obtain no-key exclusive tuple lock.
Obtain shared tuple lock.
Obtain keyshare tuple lock.
Not supported. Used for postgres compatibility.
Not supported. Used for postgres compatibility.
Obtain no tuple lock (this should never sent be on the wire). The value should be high for convenient comparisons with the other row lock types.
Used in:
, , , , , , , , , , , , , , , ,Added in 2.13, any upgraded cluster may not have this value so use it appropriately.
Used in:
Overarching information like "Aggregate" or "Limit"
Used in:
If there is an error communicating with the server (or retrieving the server registration on the server itself), this field will be set to contain the error. All subsequent fields are optional, as they may not be set if an error is encountered communicating with the individual server.
If an error has occurred earlier in the RPC call, the role may be not be set.
RPC and HTTP addresses for each server, as well as cloud related information.
Used in:
, , , , ,Placement uuid of the tserver's cluster.
Used in:
Used in:
Stateful services.
Used in:
, , , ,Test service.
Used in:
,Used in:
, , , , , , , ,This is not a simple set representation, but rather the encoded output of a yb::UnsignedIntSet<uint32_t>. An ascending nonoverlapping series of integer ranges [a,b) [c,d) ... (a<=b<c<=d...) are encoded as taking a,b, c,d, ... then compressing it by storing the set of deltas starting from 0. This gives: a-0, b-a, c-b,c-d, ...
Used in:
, ,Used in:
,This enum matches enum table locks defined in src/include/storage/lockdefs.h. Table level lock conflict matrix. Source: https://www.postgresql.org/docs/15/explicit-locking.html#LOCKING-TABLES
Used in:
,Though NONE is listed as a lock type, it seems to be a flag conveying "don't get a lock". Despite that, if a lock request with NONE type makes it to DocDB, we return an error status.
Used in:
,For index table only: consistency with respect to the indexed table.
Save this number of tablets in table schema stored in master SysCatalog. It might not be the true number of tablets in the table at runtime due to tablet splitting. It is just the number of tablets stored in the schema (for example, when user creates the table). Note: If CreateTableRequestPB::num_tablets is not specified, but this field is specified in the field CreateTableRequestPB::schema, CatalogManager::CreateTable() will create a table with number of tablets in this field.
Used to distinguish which algorithm should be used for partition key and bounds generation, value == 0 stands for the default buggy algorithm for range partitioning case, see #12189.
This must exactly match YBTableType in client.h. We have static_assert's in tablet-test.cc to verify this.
Used in:
, , , , , , , , , ,Used in:
,The TableId containing the PG table OID. Note: This will NOT match the DocDB table's uuid if the table was rewritten.
Awaiting lock info of single shard waiters.
Leader term of the peer populating the response. Used to filter out duplicate responses for the same tablet and ensure we pick the most recent results.
Indicates whether the tablet is associated with an advisory lock table. Advisory lock table has YQL table type and require different handling compared to regular PGSQL table.
Lock info for a given transaction at this tablet.
Used in:
Used in:
,Used in:
Used in:
Used in:
Used in:
Number of batches written to this tablet during transaction.
Used in:
Used in:
, , , , , , , , , , , ,16 byte uuid
Stores the time at which this transaction was taken by pg_client_session from the transaction pool. Serves as a lower bound on the ht at which the first RPC which may write intents for this transaction would have been issued to a tserver.
Stores the time at which the transaction was initiated from postgres' perspective.
Stores time when metadata was written to provisional records RocksDB on a participating tablet. So it could be used for cleanup.
Stores whether the transaction is local or global; treated as global if not present for backwards compatability.
Used in:
, ,The APPLYING and PROMOTING statuses are used in Raft in transaction participant tablets but not in status tablets.
All following entries are not used in RAFT, but as events between status tablet and involved tablets:
Used in:
The universe_uuid received from the master leader on the first heartbeat after creation or an upgrade. This is the identifier for the cluster this server belongs to. It is passed in every heartbeat request and compared with the value on master leader to ensure a match. Only set on the tserver.
Used in:
,Used in:
For update, scan_type is currently only primary key lookup.
Information about the build environment, configuration, etc.
Used in:
, ,This enum matches enum LockWaitPolicy defined in src/include/nodes/lockoptions.h and applies only to explicit row-level locking policies. NOTE: WAIT_ERROR implies no-wait semantics in Postgres i.e., if a conflicting lock is held, fail self's transaction. However, in YSQL, the WAIT_ERROR enum maps to the Fail-on-Conflict conflict management policy (where we may fail other transactions depending on priority, as described in conflict_resolution.h). It doesn't map to no-wait semantics because they aren't yet supported by docdb. But, note that NOWAIT clause is supported for READ COMMITTED isolation level when no other isolation's transactions are active. YSQL acheives that without relying on no-wait semantics from the tserver. This is done by leveraging the fact that the Fail-on-Conflict policy for read committed transactions are effectively no-wait semantics because all read committed transactions have the same kHighest priority. TODO(concurrency-control): When NOWAIT support is to be extended for REPEATABLE READ and SERIALIZABLE isolation level, we would require using WAIT_ERROR to rightly mean the no-wait policy and another enum to represent the Fail-on-Conflict policy. YSQL specifies WaitPolicy explicitly only for reads via PgsqlReadRequestPB that take explicit row-level locks.
Used in:
,Maps to Wait-on-Conflict conflict management policy as defined in conflict_resolution.h
Maps to Skip-on-Conflict conflict management policy as defined in conflict_resolution.h This is needed for the SKIP LOCKED clause
Maps to Fail-on-Conflict conflict management policy as defined in conflict_resolution.h
Represents the wait-state tracked for a given incoming rpc/request.
Used in:
,The metadata is rarely modified.
Represents the state of the curreent rpc/request when it was polled.
Some additional information that may be persisted as yb_active_universe_history.wait_event_aux
A string representation of wait_state_code for easy human consumption.
Information for a namespace currently/previously under automatic mode xCluster replication.
Used in:
,Used in:
This should never occur; not stored.
Used to denote that we cannot determine the role; not stored.
No automatic mode replication is occurring for this namespace.
Used in:
YCQL and YSQL. Unidirectional and Bidirectional. Table level.
YSQL, Unidirectional only. Table level.
YSQL, Transactional and Unidirectional only. DB level.
Used in:
, ,This consistency level provides Linearizability guarantees and is the default for our system.
Consistent-prefix consistency means that we always see consistent snapshot of the database in a well-defined order. If operations A, B and C take place, we will either see A, AB, or ABC. Note that reads might still go back in time since we might see ABC on one replica and AB on another.
For cross-shard transactions only: user-enforced consistency level means it is the user's responsibility to enforce consistency across shards or tables/indexes.
Client type.
Used in:
, , , , , , , ,Pgsql database
See yb_sampling_algorithm flag description. Should be in sync with YbcSamplingAlgorithmEnum in GUC (ybc_util.h).
Used in: