Get desktop application:
View/edit binary Protocol Buffers messages
A Raft implementation.
Analogous to AppendEntries in Raft, but only used for followers.
Similar to UpdateConsensus but takes a batch of ConsensusRequestPB and returns a batch of ConsensusResponsePB.
RequestVote() from Raft.
A request from a candidate peer that wishes to become leader of the configuration serving tablet with 'tablet_id'. See RAFT sec. 5.2.
UUID of server this request is addressed to.
Identifies the tablet configuration a the vote is being requested for.
The uuid of the sending peer.
The term we are requesting a vote for. If this term is higher than the callee's term, the callee will update its own term to match, and if it is the current leader it will step down.
The candidate node status so that the voter node can decide whether to vote for it as LEADER. In particular, this includes the last OpId persisted in the candidate's log, which corresponds to the lastLogIndex and lastLogTerm fields in Raft. A replica must vote no for a candidate that has an OpId lower than them.
Normally, replicas will deny a vote with a LEADER_IS_ALIVE error if they are a leader or recently heard from a leader. This is to prevent partitioned nodes from disturbing liveness. If this flag is true, peers will vote even if they think a leader is alive. This can be used for example to force a faster leader hand-off rather than waiting for the election timer to expire.
true if it is pre-election request and state of voter should not be changed.
A response from a replica to a leader election request.
The uuid of the node sending the reply.
The term of the node sending the reply. Allows the candidate to update itself if it is behind.
True if this peer voted for the caller, false otherwise.
An upper bound on the remainder of the old leader's lease that the new leader has to wait out before it can start serving read and write requests.
UUID of node that has lease.
An upper bound on the deadline of the old hybrid time leader's lease that the new leader has to wait out before it can start serving read and write requests. Contains only physical part of hybrid time.
UUID of node that has ht lease. Lease and lease ht have different lifetime, so set of leases and corresponding UUIDs could be different.
true if it is response to pre-election request.
TODO: Migrate ConsensusService to the AppStatusPB RPC style and merge these errors. Error message from the consensus implementation.
A generic error message (such as tablet not found).
Implements all of the one-by-one config change operations, including AddServer() and RemoveServer() from the Raft specification, as well as an operation to change the role of a server between VOTER and PRE_VOTER. An OK response means the operation was successful.
A configuration change request for the tablet with 'tablet_id'. These requests are restricted to one-by-one operations, as specified in Diego Ongaro's Raft PhD thesis. This is the RPC request, but it does not end up in the log. See also ChangeConfigRecordPB.
UUID of server this request is addressed to.
The type of config change requested. This field must be specified, but is left as optional due to being an enum.
The peer to add or remove. When 'type' == ADD_SERVER, both the permanent_uuid and last_known_addr fields must be set. For REMOVE_SERVER, either the permanent_uuid field or last_known_addr should be set - 'use_host' is for using the latter.
The OpId index of the committed config to replace. This optional parameter is here to provide an atomic (compare-and-swap) ChangeConfig operation. The ChangeConfig() operation will fail if this parameter is specified and the committed config does not have a matching opid_index. See also the definition of RaftConfigPB.
For REMOVE case, when this is set, we assume that the peer is dead and will not affect the quorum anymore. So we skip checking/using its server uuid and remove it based on the ip/port info from 'server'.
The configuration change response. If any immediate error occurred the 'error' field is set with it.
The hybrid_time chosen by the server for this change config operation. TODO: At the time of writing, this field is never set in the response. TODO: Propagate signed hybrid_times. See KUDU-611.
Implements unsafe config change operation for manual recovery use cases.
An unsafe change configuration request for the tablet with 'tablet_id'.
UUID of server this request is addressed to.
Sender identification, it could be a static string as well.
The raft config sent to destination server. Only the 'permanent_uuid' of each peer in the config is required (address-related information is ignored by the server). The peers specified in the 'new_config' are required to be a subset of (or equal to) the peers in the committed config on the destination replica.
The unsafe change configuration response. 'error' field is set if operation failed.
(message has no fields)
Force this node to run a leader election.
Message that makes the local peer run leader election to be elected leader. Assumes that a tablet with 'tablet_id' exists.
UUID of server this request is addressed to.
the id of the tablet
The id of the last operation the current leader has committed to its log. When specified, the peer must wait till the operation has been committed to its log before starting the election to be qualified as a leader.
UUID of old leader, that requested this election. If we lost election, we would notify originator about this fact. So it would reset its withold timeout and try to become a leader again.
A generic error message (such as tablet not found).
Notify originator about lost election, so it could reset its timeout.
UUID of server this request is addressed to. It is originator_uuid from RunLeaderElectionRequestPB
the id of the tablet
UUID of server that lost election
A generic error message (such as tablet not found).
Force this node to step down as leader.
UUID of server this request is addressed to.
The id of the tablet.
UUID of the server that should run the election to become the new leader.
Used in tests to ignore check for new leader conditions.
If new_leader_uuid is not specified, the current leader will attempt to gracefully transfer leadership to another peer. Setting this flag disables that behavior.
A generic error message (such as tablet not found).
Time in milliseconds since previous election failure with the same intended ("protege") leader. This is used to track the remaining amount of time during which the load balancer should not attempt to ask (any) leader of this tablet to step down in favor of the same peer.
Get the latest committed or received opid on the server.
UUID of server this request is addressed to.
the id of the tablet
Whether to return the last-received or last-committed OpId.
A generic error message (such as tablet not found).
Returns the committed Consensus state.
UUID of server this request is addressed to.
The id of the tablet.
Whether to fetch the committed or active consensus state.
A generic error message (such as tablet not found).
Allows returning the leader lease status in the same RPC. Useful for waiting for the leader to be allowed to serve requests.
Instruct this server to remotely bootstrap a tablet from another host.
UUID of server this request is addressed to.
Identification for the host we are bootstrapping from.
The caller's term. In the case that the target of this request has a TOMBSTONED replica with a term higher than this one, the request will fail.
When the remote bootstrap request is served by a closest follower to the new peer, the new peer needs to update the leader on progessing the log anchor.
If this tablet was created by cloning, the sequence number of the operation that created it and the clone source tablet's id. This is used to reject RBS requests if the source tablet is present on the destination and has not applied the clone yet.
A configuration change request for the tablet with 'tablet_id'. This message is dynamically generated by the leader when AddServer() or RemoveServer() is called, and is what gets replicated to the log.
Used in:
The old committed configuration config for verification purposes.
The new configuration to set the configuration to.
Used in:
Used in:
Used in:
Committed consensus config. This includes the consensus configuration that has been serialized through consensus and committed, thus having a valid opid_index field set.
Active consensus config. This could be a pending consensus config that has not yet been committed. If the config is not committed, its opid_index field will not be set.
Consensus-specific errors use this protobuf
Used in:
,The error code.
The Status object for the error. This will include a textual message that may be more useful to present in log messages, etc, though its error code is less specific.
The codes for consensus responses. These are set in the status when some consensus internal error occurs and require special handling by the caller. A generic error code is purposefully absent since generic errors should use tserver.TabletServerErrorPB.
Used in:
Invalid term. Sent by peers in response to leader RPCs whenever the term of one of the messages sent in a batch is lower than the the term the peer is expecting.
For leader election. The last OpId logged by the candidate is older than the last OpId logged by the local peer.
For leader election. The local replica has already voted for another candidate in this term.
The replica does not recognize the caller's request as coming from a member of the configuration.
The responder's last entry didn't match the caller's preceding entry.
The local replica is either a leader, or has heard from a valid leader more recently than the election timeout, so believes the leader to be alive.
The local replica is in the middle of servicing either another vote or an update from a valid leader.
The local replica was unable to prepare a single transaction.
Split operation has been added to this Raft group log.
This PB is used to serialize all of the persistent state needed for Consensus that is not in the WAL, such as leader election and communication on startup.
Last-committed peership.
Latest term this server has seen. When a configuration is first created, initialized to 0. Whenever a new election is started, the candidate increments this by one and requests votes from peers. If any RPC or RPC response is received from another node containing a term higher than this one, the server should step down to FOLLOWER and set its current_term to match the caller's term. If a follower receives an UpdateConsensus RPC with a term lower than this term, then that implies that the RPC is coming from a former LEADER who has not realized yet that its term is over. In that case, we will reject the UpdateConsensus() call with ConsensusErrorPB::INVALID_TERM. If a follower receives a RequestConsensusVote() RPC with an earlier term, the vote is denied.
Permanent UUID of the candidate voted for in 'current_term', or not present if no vote was made in the current term.
A consensus request message, the basic unit of a consensus round.
Used as request type in: ConsensusService.UpdateConsensus
Used as field type in:
UUID of server this request is addressed to.
The uuid of the peer making the call.
The caller's term. As only leaders can send messages, replicas will accept all messages as long as the term is equal to or higher than the last term they know about. If a leader receives a request with a term higher than its own, it will step down and enter FOLLOWER state (see Raft sec. 5.1).
The id of the operation immediately preceding the first operation in 'ops'. If the replica is receiving 'ops' for the first time 'preceding_id' must match the replica's last operation. This must be set if 'ops' is non-empty.
The id of the last committed operation in the configuration. This is the id of the last operation the leader deemed committed from a consensus standpoint (not the last operation the leader applied). Raft calls this field 'leaderCommit'.
Sequence of operations to be replicated by this peer. These will be committed when committed_index advances above their respective OpIds. In some cases committed_index can indicate that these operations are already committed, in which case they will be committed during the same request.
Leader lease duration in milliseconds. A leader is not allowed to serve up-to-date reads until it is able to replicate a lease extension. A new leader cannot assume its responsibilities until this amount time has definitely passed since the old leader sent the consensus request. Due to potential clock skew, we are not sending a timestamp, but an amount of time followers have to wait.
Leader lease expiration, physical part of hybrid time. A new leader cannot add new entries to RAFT log until hybrid time passes this expiration.
Safe time measured on the leader at the time this request was generated.
Hybrid time on the leader when this request was generated.
If enabled, a trace will be collected for this RPC and returned in the response.
Used as response type in: ConsensusService.UpdateConsensus
Used as field type in:
The uuid of the peer making the response.
The current term of the peer making the response. This is used to update the caller (and make it step down if it is out of date).
The current consensus status of the receiver peer.
A generic error message (such as tablet not found), per operation error messages are sent along with the consensus status.
Number of SST files in this follower regular db.
Hybrid time on the follower when this request was processed.
The trace for the RPC as a string, returned only if requested.
Represents a snapshot of a configuration at a given moment in time.
Used in:
, , , ,A configuration is always guaranteed to have a known term.
There may not always be a leader of a configuration at any given time. The node that the local peer considers to be leader changes based on rules defined in the Raft specification. Roughly, this corresponds either to being elected leader (in the case that the local peer is the leader), or when an update is accepted from another node, which basically just amounts to a term check on the UpdateConsensus() RPC request. Whenever the local peer sees a new term, the leader flag is cleared until a new leader is acknowledged based on the above critera. Simply casting a vote for a peer is not sufficient to assume that that peer has won the election, so we do not update this field based on our vote. The leader listed here, if any, should always be a member of 'configuration', and the term that the node is leader of _must_ equal the term listed above in the 'current_term' field. The Master will use the combination of current term and leader uuid to determine when to update its cache of the current leader for client lookup purposes. There is a corner case in Raft where a node may be elected leader of a pending (uncommitted) configuration. In such a case, if the leader of the pending configuration is not a member of the committed configuration, and it is the committed configuration that is being reported, then the leader_uuid field should be cleared by the process filling in the ConsensusStatePB object.
The peers. In some contexts, this will be the committed configuration, which will always have configuration.opid_index set. In other contexts, this may a "pending" configuration, which is active but in the process of being committed. In any case, initial peership is set on tablet start, so this field should always be present.
Status message received in the peer responses.
Used in:
,The last message received (and replicated) by the peer.
The id of the last op that was replicated by the current leader. This doesn't necessarily mean that the term of this op equals the current term, since the current leader may be replicating ops from a prior term. Unset if none currently received. In the case where there is a log matching property error (PRECEDING_ENTRY_DIDNT_MATCH), this field is important and may still be set, since the leader queue uses this field in conjuction with last_received to decide on the next id to send to the follower.
The last committed index that is known to the peer.
The last operation applied by the peer.
When the last request failed for some consensus related (internal) reason. In some cases the error will have a specific code that the caller will have to handle in certain ways.
The transaction driver type: indicates whether a transaction is being executed on a leader or a replica.
Required by HistoryCutoffOperation.
Used in:
History cutoff of all the tables of this tablet. On the master, this field is used for the sys catalog table. For cotables i.e. ysql system tables, the cotables_cutoff_ht field is used. On tservers, this field is used everywhere i.e. for both colocated and non-colocated tables. In such cases, the cotables_cutoff_ht is invalid.
Set only on the master for cotables.
Used in:
,NoOp requests, mostly used in tests.
Used in:
Allows to set a dummy payload, for tests.
NoOp responses, mostly used in tests.
Allows to set a dummy payload, for tests.
Used in:
A message reflecting the status of an in-flight transaction.
Time the transaction has been in flight.
Quick human-readable description (e.g., ToString() output).
If tracing is enabled when viewing the transaction, the trace buffer is copied here.
The types of operations that need a commit message, i.e. those that require at least one round of the consensus algorithm.
Used in:
, ,Used in:
,Any server added into a running consensus with the intention of becoming a VOTER, should be added as a PRE_VOTER. Such a server stays as a PRE_VOTER (for a specific tablet) till it is remote bootstrapped, after which it is promoted to VOTER. While in this mode, a server will not vote nor try to become a leader.
Any server added into a running consensus with the intention of becoming an async replica (OBSERVER), Such a server stays as a PRE_OBSERVER (for a specific tablet) till it is remote bootstrapped, after which it is promoted to OBSERVER. While in this mode, a server will not vote nor try to become a leader.
Async replication mode. An OBSERVER doesn't participate in any decisions regarding the consensus configuration. It only accepts update requests and allows read requests.
The id of the operation that failed in the other peer.
The Status explaining why the operation failed.
A set of peers, serving a single tablet.
Used in:
, , , , ,The index of the operation which serialized this RaftConfigPB through consensus. It is set when the operation is consensus-committed (replicated to a majority of voters) and before the consensus metadata is updated. It is left undefined if the operation isn't committed.
Flag to allow unsafe config change operations.
The set of peers in the configuration.
A peer in a configuration.
Used in:
, ,Permanent uuid is optional: RaftPeerPB/RaftConfigPB instances may be created before the permanent uuid is known (e.g., when manually specifying a configuration for Master/CatalogManager); permament uuid can be retrieved at a later time through RPC.
A Replicate message, sent to replicas by leader to indicate this operation must be stored in the write-ahead log.
The Raft operation ID (term and index) being replicated.
Used in:
,The hybrid time assigned to this message.
A counter that is forever increasing in a tablet (like hybrid time). Used for list indexing.
The Raft operation ID known to the leader to be committed at the time this message was sent. This is used during tablet bootstrap for RocksDB-backed tables.
Retryable requests.
Used in: