Proto commits in neondatabase/neon

These 47 commits are when the Protocol Buffers files have changed:

Commit:0f3c069
Author:Erik Grinaker

Add TODO for batch vs. individual responses

Commit:f94d64a
Author:Erik Grinaker

Remove `GetPageRequestBatch`, add `GetPageRequest.block_count`

Commit:6b92b97
Author:Erik Grinaker

Remove `ReadLsn` TODO

Commit:dc93b22
Author:Erik Grinaker

Rename `RequestCommon` to `ReadLsn`

Commit:ae97130
Author:Erik Grinaker

Use verbs for all methods

Commit:03d977e
Author:Erik Grinaker

Prefix `id` and `class` with `request_` and group

Commit:30fed81
Author:Erik Grinaker

Remove `optional` on `reason`

Commit:67dcbd9
Author:Erik Grinaker

EOF

Commit:204b725
Author:Erik Grinaker

pageserver: add gRPC page service schema

Commit:cf5d038
Author:Erik Grinaker

service documentation

Commit:d785100
Author:Erik Grinaker

page_api: add `GetPageRequest::class`

Commit:2c0d930
Author:Erik Grinaker

page_api: add `GetPageResponse::status`

Commit:66171a1
Author:Erik Grinaker

page_api: add `GetPageRequestBatch`

Commit:df2806e
Author:Erik Grinaker

page_api: add `GetPageRequest::id`

Commit:0763169
Author:Erik Grinaker

page_api: protobuf comments

Commit:4c77397
Author:Erik Grinaker

Add `neon-shard-id` header

Commit:7bb58be
Author:Erik Grinaker

Use `authorization` header instead of `neon-auth-token`

Commit:b5373de
Author:Erik Grinaker

page_api: add `get_slru_segment()`

Commit:0f520d7
Author:Erik Grinaker
Committer:Erik Grinaker

pageserver: rename `data_api` to `page_api`

Commit:93eb7bb
Author:Heikki Linnakangas

include lots of changes that went missing by accident

Commit:e58d0fe
Author:Heikki Linnakangas

New communicator, with "integrated" cache accessible from all processes

Commit:391a287
Author:Erik Grinaker

communicator: add streaming GetPage RPC

Commit:b3297df
Author:Heikki Linnakangas
Committer:Heikki Linnakangas

git add yet another missing file

Commit:dce99ba
Author:Heikki Linnakangas
Committer:Heikki Linnakangas

Implement the remaining API functions. Now when you set neon.enable_new_communicator feature flag, the new communicator is used for everything. The old communicator code is not even initialized anymore. Everything is super slow, because we now open a new HTTP/2 connection for every GetPage request.

Commit:38293be
Author:Heikki Linnakangas
Committer:Heikki Linnakangas

WIP: move code around for clarity There's now a client_grpc package, fully independent of PostgreSQL and the neon extension. There's also a separate package for the "model" structs representing the API and the 'prost' generated code, which is shared by client and server

The documentation is generated from this commit.

Commit:b616ad8
Author:Heikki Linnakangas

WIP: Skeleton version of the new communicator It's only used for dbsize requests at the moment.

Commit:eb4506e
Author:Heikki Linnakangas

git add yet another missing file

Commit:6092caf
Author:Heikki Linnakangas
Committer:Heikki Linnakangas

WIP: move code around for clarity There's now a client_grpc package, fully independent of PostgreSQL and the neon extension. There's also a separate package for the "model" structs representing the API and the 'prost' generated code, which is shared by client and server

The documentation is generated from this commit.

Commit:57765a7
Author:Heikki Linnakangas
Committer:Heikki Linnakangas

WIP: Skeleton version of the new communicator It's only used for dbsize requests at the moment.

Commit:1287767
Author:Heikki Linnakangas
Committer:Heikki Linnakangas

Implement the remaining API functions. Now when you set neon.enable_new_communicator feature flag, the new communicator is used for everything. The old communicator code is not even initialized anymore. Everything is super slow, because we now open a new HTTP/2 connection for every GetPage request.

Commit:181af30
Author:Dmitrii Kovalkov
Committer:GitHub

storcon + safekeeper + scrubber: propagate root CA certs everywhere (#11418) ## Problem There are some places in the code where we create `reqwest::Client` without providing SSL CA certs from `ssl_ca_file`. These will break after we enable TLS everywhere. - Part of https://github.com/neondatabase/cloud/issues/22686 ## Summary of changes - Support `ssl_ca_file` in storage scrubber. - Add `use_https_safekeeper_api` option to safekeeper to use https for peer requests. - Propagate SSL CA certs to storage_controller/client, storcon's ComputeHook, PeerClient and maybe_forward.

The documentation is generated from this commit.

Commit:3499641
Author:Vlad Lazar
Committer:GitHub

pageserver: guard against WAL gaps in the interpreted protocol (#10858) ## Problem The interpreted SK <-> PS protocol does not guard against gaps (neither does the Vanilla one, but that's beside the point). ## Summary of changes Extend the protocol to include the start LSN of the PG WAL section from which the records were interpreted. Validation is enabled via a config flag on the pageserver and works as follows: **Case 1**: `raw_wal_start_lsn` is smaller than the requested LSN There can't be gaps here, but we check that the shard received records which it hasn't seen before. **Case 2**: `raw_wal_start_lsn` is equal to the requested LSN This is the happy case. No gap and nothing to check **Case 3**: `raw_wal_start_lsn` is greater than the requested LSN This is a gap. To make Case 3 work I had to bend the protocol a bit. We read record chunks of WAL which aren't record aligned and feed them to the decoder. The picture below shows a shard which subscribes at a position somewhere within Record 2. We already have a wal reader which is below that position so we wait to catch up. We read some wal in Read 1 (all of Record 1 and some of Record 2). The new shard doesn't need Record 1 (it has already processed it according to the starting position), but we read past it's starting position. When we do Read 2, we decode Record 2 and ship it off to the shard, but the starting position of Read 2 is greater than the starting position the shard requested. This looks like a gap. ![image](https://github.com/user-attachments/assets/8aed292e-5d62-46a3-9b01-fbf9dc25efe0) To make it work, we extend the protocol to send an empty `InterpretedWalRecords` to shards if the WAL the records originated from ends the requested start position. On the pageserver, that just updates the tracking LSNs in memory (no-op really). This gives us a workaround for the fake gap. As a drive by, make `InterpretedWalRecords::next_record_lsn` mandatory in the application level definition. It's always included. Related: https://github.com/neondatabase/cloud/issues/23935

Commit:4b04e3b
Author:Christian Schwarz

experiment: find all places that need new rclsn-by-generation facility delete all the code that uses current inmem & controlfile remote_consistent_lsn field After this removal, search for `remote_consistent_lsn` highlights the places where the new solution that tracks rclsn per generation id needs to be fitted in

Commit:fde1046
Author:Vlad Lazar
Committer:Vlad Lazar

wal_decoder: fix compact key protobuf encoding (#10074) ## Problem Protobuf doesn't support 128 bit integers, so we encode the keys as two 64 bit integers. Issue is that when we split the 128 bit compact key we use signed 64 bit integers to represent the two halves. This may result in a negative lower half when relnode is larger than `0x00800000`. When we convert the lower half to an i128 we get a negative `CompactKey`. ## Summary of Changes Use unsigned integers when encoding into Protobuf. ## Deployment * Prod: We disabled the interpreted proto, so no compat concerns. * Staging: Disable the interpreted proto, do one release, and then release the fixed version. We do this because a negative int32 will convert to a large uint32 value and could give a key in the actual pageserver space. In production we would around this by adding new fields to the proto and deprecating the old ones, but we can make our lives easy here. * Pre-prod: Same as staging

Commit:665369c
Author:Vlad Lazar
Committer:GitHub

wal_decoder: fix compact key protobuf encoding (#10074) ## Problem Protobuf doesn't support 128 bit integers, so we encode the keys as two 64 bit integers. Issue is that when we split the 128 bit compact key we use signed 64 bit integers to represent the two halves. This may result in a negative lower half when relnode is larger than `0x00800000`. When we convert the lower half to an i128 we get a negative `CompactKey`. ## Summary of Changes Use unsigned integers when encoding into Protobuf. ## Deployment * Prod: We disabled the interpreted proto, so no compat concerns. * Staging: Disable the interpreted proto, do one release, and then release the fixed version. We do this because a negative int32 will convert to a large uint32 value and could give a key in the actual pageserver space. In production we would around this by adding new fields to the proto and deprecating the old ones, but we can make our lives easy here. * Pre-prod: Same as staging

Commit:9e0148d
Author:Vlad Lazar
Committer:GitHub

safekeeper: use protobuf for sending compressed records to pageserver (#9821) ## Problem https://github.com/neondatabase/neon/pull/9746 lifted decoding and interpretation of WAL to the safekeeper. This reduced the ingested amount on the pageservers by around 10x for a tenant with 8 shards, but doubled the ingested amount for single sharded tenants. Also, https://github.com/neondatabase/neon/pull/9746 uses bincode which doesn't support schema evolution. Technically the schema can be evolved, but it's very cumbersome. ## Summary of changes This patch set addresses both problems by adding protobuf support for the interpreted wal records and adding compression support. Compressed protobuf reduced the ingested amount by 100x on the 32 shards `test_sharded_ingest` case (compared to non-interpreted proto). For the 1 shard case the reduction is 5x. Sister change to `rust-postgres` is [here](https://github.com/neondatabase/rust-postgres/pull/33). ## Links Related: https://github.com/neondatabase/neon/issues/9336 Epic: https://github.com/neondatabase/neon/issues/9329

Commit:b32c6a8
Author:Vlad Lazar
Committer:Vlad Lazar

wal_decoder: generate protobuf types

Commit:478cc37
Author:Arseny Sher
Committer:Arseny Sher

Propagate standby apply LSN to pageserver to hold off GC. To avoid pageserver gc'ing data needed by standby, propagate standby apply LSN through standby -> safekeeper -> broker -> pageserver flow and hold off GC for it. Iteration of GC resets the value to remove the horizon when standby goes away -- pushes are assumed to happen at least once between gc iterations. As a safety guard max allowed lag compared to normal GC horizon is hardcoded as 10GB. Add test for the feature. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>

Commit:9d984ed
Author:Konstantin Knizhnik
Committer:Konstantin Knizhnik

Replace latest with horizon in get_page request

Commit:8cfc5ad
Author:Konstantin Knizhnik
Committer:Konstantin Knizhnik

Propagate repply_flush_lsn from SK to PS to prevent GC from collecting objects which may be still requested by replica

Commit:613906a
Author:Arthur Petukhovsky
Committer:GitHub

Support custom types in broker (#5761) Old methods are unchanged for backwards compatibility. Added `SafekeeperDiscoveryRequest` and `SafekeeperDiscoveryResponse` types to serve as example, and also as a prerequisite for https://github.com/neondatabase/neon/issues/5471

Commit:73a2c9e
Author:Arthur Petukhovsky
Committer:Arthur Petukhovsky

WIP Safekeeper broker discovery

Commit:e98580b
Author:Arseny Sher
Committer:Arseny Sher

Add term and http endpoint to broker messaged SkTimelineInfo. We need them for safekeeper peer recovery https://github.com/neondatabase/neon/pull/4875

Commit:814abd9
Author:Arthur Petukhovsky
Committer:GitHub

Switch to safekeeper in the same AZ (#3883) Add a condition to switch walreceiver connection to safekeeper that is located in the same availability zone. Switch happens when commit_lsn of a candidate is not less than commit_lsn from the active connection. This condition is expected not to trigger instantly, because commit_lsn of a current connection is usually greater than commit_lsn of updates from the broker. That means that if WAL is written continuously, switch can take a lot of time, but it should happen eventually. Now protoc 3.15+ is required for building neon. Fixes https://github.com/neondatabase/neon/issues/3200

Commit:2d42f84
Author:Arseny Sher
Committer:Arseny Sher

Add storage_broker binary. Which ought to replace etcd. This patch only adds the binary and adjusts Dockerfile to include it; subsequent ones will add deploy of helm chart and the actual replacement. It is a simple and fast pub-sub message bus. In this patch only safekeeper message is supported, but others can be easily added. Compilation now requires protoc to be installed. Installing protobuf-compiler package is fine for Debian/Ubuntu. ref https://github.com/neondatabase/neon/pull/2733 https://github.com/neondatabase/neon/issues/2394

Commit:3aad65e
Author:Arseny Sher
Committer:Arseny Sher

Add neon_broker binary. Which ought to replace neon_broker. This patch only adds the binary and adjusts Dockerfile to include it; subsequent ones will add deploy of helm chart and the actual replacement. It is a simple and fast pub-sub message bus. In this patch only safekeeper message is supported, but others can be easily added. Compilation now requires protoc to be installed. Installing protobuf-compiler package is fine for Debian/Ubuntu. ref https://github.com/neondatabase/neon/pull/2733 https://github.com/neondatabase/neon/issues/2394

Commit:a40e1ed
Author:Arseny Sher
Committer:Arseny Sher

Tonic based broker for SkTimelineInfo.