These 47 commits are when the Protocol Buffers files have changed:
Commit: | 0f3c069 | |
---|---|---|
Author: | Erik Grinaker |
Add TODO for batch vs. individual responses
Commit: | f94d64a | |
---|---|---|
Author: | Erik Grinaker |
Remove `GetPageRequestBatch`, add `GetPageRequest.block_count`
Commit: | 6b92b97 | |
---|---|---|
Author: | Erik Grinaker |
Remove `ReadLsn` TODO
Commit: | dc93b22 | |
---|---|---|
Author: | Erik Grinaker |
Rename `RequestCommon` to `ReadLsn`
Commit: | ae97130 | |
---|---|---|
Author: | Erik Grinaker |
Use verbs for all methods
Commit: | 03d977e | |
---|---|---|
Author: | Erik Grinaker |
Prefix `id` and `class` with `request_` and group
Commit: | 30fed81 | |
---|---|---|
Author: | Erik Grinaker |
Remove `optional` on `reason`
Commit: | 67dcbd9 | |
---|---|---|
Author: | Erik Grinaker |
EOF
Commit: | 204b725 | |
---|---|---|
Author: | Erik Grinaker |
pageserver: add gRPC page service schema
Commit: | cf5d038 | |
---|---|---|
Author: | Erik Grinaker |
service documentation
Commit: | d785100 | |
---|---|---|
Author: | Erik Grinaker |
page_api: add `GetPageRequest::class`
Commit: | 2c0d930 | |
---|---|---|
Author: | Erik Grinaker |
page_api: add `GetPageResponse::status`
Commit: | 66171a1 | |
---|---|---|
Author: | Erik Grinaker |
page_api: add `GetPageRequestBatch`
Commit: | df2806e | |
---|---|---|
Author: | Erik Grinaker |
page_api: add `GetPageRequest::id`
Commit: | 0763169 | |
---|---|---|
Author: | Erik Grinaker |
page_api: protobuf comments
Commit: | 4c77397 | |
---|---|---|
Author: | Erik Grinaker |
Add `neon-shard-id` header
Commit: | 7bb58be | |
---|---|---|
Author: | Erik Grinaker |
Use `authorization` header instead of `neon-auth-token`
Commit: | b5373de | |
---|---|---|
Author: | Erik Grinaker |
page_api: add `get_slru_segment()`
Commit: | 0f520d7 | |
---|---|---|
Author: | Erik Grinaker | |
Committer: | Erik Grinaker |
pageserver: rename `data_api` to `page_api`
Commit: | 93eb7bb | |
---|---|---|
Author: | Heikki Linnakangas |
include lots of changes that went missing by accident
Commit: | e58d0fe | |
---|---|---|
Author: | Heikki Linnakangas |
New communicator, with "integrated" cache accessible from all processes
Commit: | 391a287 | |
---|---|---|
Author: | Erik Grinaker |
communicator: add streaming GetPage RPC
Commit: | b3297df | |
---|---|---|
Author: | Heikki Linnakangas | |
Committer: | Heikki Linnakangas |
git add yet another missing file
Commit: | dce99ba | |
---|---|---|
Author: | Heikki Linnakangas | |
Committer: | Heikki Linnakangas |
Implement the remaining API functions. Now when you set neon.enable_new_communicator feature flag, the new communicator is used for everything. The old communicator code is not even initialized anymore. Everything is super slow, because we now open a new HTTP/2 connection for every GetPage request.
Commit: | 38293be | |
---|---|---|
Author: | Heikki Linnakangas | |
Committer: | Heikki Linnakangas |
WIP: move code around for clarity There's now a client_grpc package, fully independent of PostgreSQL and the neon extension. There's also a separate package for the "model" structs representing the API and the 'prost' generated code, which is shared by client and server
The documentation is generated from this commit.
Commit: | b616ad8 | |
---|---|---|
Author: | Heikki Linnakangas |
WIP: Skeleton version of the new communicator It's only used for dbsize requests at the moment.
Commit: | eb4506e | |
---|---|---|
Author: | Heikki Linnakangas |
git add yet another missing file
Commit: | 6092caf | |
---|---|---|
Author: | Heikki Linnakangas | |
Committer: | Heikki Linnakangas |
WIP: move code around for clarity There's now a client_grpc package, fully independent of PostgreSQL and the neon extension. There's also a separate package for the "model" structs representing the API and the 'prost' generated code, which is shared by client and server
The documentation is generated from this commit.
Commit: | 57765a7 | |
---|---|---|
Author: | Heikki Linnakangas | |
Committer: | Heikki Linnakangas |
WIP: Skeleton version of the new communicator It's only used for dbsize requests at the moment.
Commit: | 1287767 | |
---|---|---|
Author: | Heikki Linnakangas | |
Committer: | Heikki Linnakangas |
Implement the remaining API functions. Now when you set neon.enable_new_communicator feature flag, the new communicator is used for everything. The old communicator code is not even initialized anymore. Everything is super slow, because we now open a new HTTP/2 connection for every GetPage request.
Commit: | 181af30 | |
---|---|---|
Author: | Dmitrii Kovalkov | |
Committer: | GitHub |
storcon + safekeeper + scrubber: propagate root CA certs everywhere (#11418) ## Problem There are some places in the code where we create `reqwest::Client` without providing SSL CA certs from `ssl_ca_file`. These will break after we enable TLS everywhere. - Part of https://github.com/neondatabase/cloud/issues/22686 ## Summary of changes - Support `ssl_ca_file` in storage scrubber. - Add `use_https_safekeeper_api` option to safekeeper to use https for peer requests. - Propagate SSL CA certs to storage_controller/client, storcon's ComputeHook, PeerClient and maybe_forward.
The documentation is generated from this commit.
Commit: | 3499641 | |
---|---|---|
Author: | Vlad Lazar | |
Committer: | GitHub |
pageserver: guard against WAL gaps in the interpreted protocol (#10858) ## Problem The interpreted SK <-> PS protocol does not guard against gaps (neither does the Vanilla one, but that's beside the point). ## Summary of changes Extend the protocol to include the start LSN of the PG WAL section from which the records were interpreted. Validation is enabled via a config flag on the pageserver and works as follows: **Case 1**: `raw_wal_start_lsn` is smaller than the requested LSN There can't be gaps here, but we check that the shard received records which it hasn't seen before. **Case 2**: `raw_wal_start_lsn` is equal to the requested LSN This is the happy case. No gap and nothing to check **Case 3**: `raw_wal_start_lsn` is greater than the requested LSN This is a gap. To make Case 3 work I had to bend the protocol a bit. We read record chunks of WAL which aren't record aligned and feed them to the decoder. The picture below shows a shard which subscribes at a position somewhere within Record 2. We already have a wal reader which is below that position so we wait to catch up. We read some wal in Read 1 (all of Record 1 and some of Record 2). The new shard doesn't need Record 1 (it has already processed it according to the starting position), but we read past it's starting position. When we do Read 2, we decode Record 2 and ship it off to the shard, but the starting position of Read 2 is greater than the starting position the shard requested. This looks like a gap.  To make it work, we extend the protocol to send an empty `InterpretedWalRecords` to shards if the WAL the records originated from ends the requested start position. On the pageserver, that just updates the tracking LSNs in memory (no-op really). This gives us a workaround for the fake gap. As a drive by, make `InterpretedWalRecords::next_record_lsn` mandatory in the application level definition. It's always included. Related: https://github.com/neondatabase/cloud/issues/23935
Commit: | 4b04e3b | |
---|---|---|
Author: | Christian Schwarz |
experiment: find all places that need new rclsn-by-generation facility delete all the code that uses current inmem & controlfile remote_consistent_lsn field After this removal, search for `remote_consistent_lsn` highlights the places where the new solution that tracks rclsn per generation id needs to be fitted in
Commit: | fde1046 | |
---|---|---|
Author: | Vlad Lazar | |
Committer: | Vlad Lazar |
wal_decoder: fix compact key protobuf encoding (#10074) ## Problem Protobuf doesn't support 128 bit integers, so we encode the keys as two 64 bit integers. Issue is that when we split the 128 bit compact key we use signed 64 bit integers to represent the two halves. This may result in a negative lower half when relnode is larger than `0x00800000`. When we convert the lower half to an i128 we get a negative `CompactKey`. ## Summary of Changes Use unsigned integers when encoding into Protobuf. ## Deployment * Prod: We disabled the interpreted proto, so no compat concerns. * Staging: Disable the interpreted proto, do one release, and then release the fixed version. We do this because a negative int32 will convert to a large uint32 value and could give a key in the actual pageserver space. In production we would around this by adding new fields to the proto and deprecating the old ones, but we can make our lives easy here. * Pre-prod: Same as staging
Commit: | 665369c | |
---|---|---|
Author: | Vlad Lazar | |
Committer: | GitHub |
wal_decoder: fix compact key protobuf encoding (#10074) ## Problem Protobuf doesn't support 128 bit integers, so we encode the keys as two 64 bit integers. Issue is that when we split the 128 bit compact key we use signed 64 bit integers to represent the two halves. This may result in a negative lower half when relnode is larger than `0x00800000`. When we convert the lower half to an i128 we get a negative `CompactKey`. ## Summary of Changes Use unsigned integers when encoding into Protobuf. ## Deployment * Prod: We disabled the interpreted proto, so no compat concerns. * Staging: Disable the interpreted proto, do one release, and then release the fixed version. We do this because a negative int32 will convert to a large uint32 value and could give a key in the actual pageserver space. In production we would around this by adding new fields to the proto and deprecating the old ones, but we can make our lives easy here. * Pre-prod: Same as staging
Commit: | 9e0148d | |
---|---|---|
Author: | Vlad Lazar | |
Committer: | GitHub |
safekeeper: use protobuf for sending compressed records to pageserver (#9821) ## Problem https://github.com/neondatabase/neon/pull/9746 lifted decoding and interpretation of WAL to the safekeeper. This reduced the ingested amount on the pageservers by around 10x for a tenant with 8 shards, but doubled the ingested amount for single sharded tenants. Also, https://github.com/neondatabase/neon/pull/9746 uses bincode which doesn't support schema evolution. Technically the schema can be evolved, but it's very cumbersome. ## Summary of changes This patch set addresses both problems by adding protobuf support for the interpreted wal records and adding compression support. Compressed protobuf reduced the ingested amount by 100x on the 32 shards `test_sharded_ingest` case (compared to non-interpreted proto). For the 1 shard case the reduction is 5x. Sister change to `rust-postgres` is [here](https://github.com/neondatabase/rust-postgres/pull/33). ## Links Related: https://github.com/neondatabase/neon/issues/9336 Epic: https://github.com/neondatabase/neon/issues/9329
Commit: | b32c6a8 | |
---|---|---|
Author: | Vlad Lazar | |
Committer: | Vlad Lazar |
wal_decoder: generate protobuf types
Commit: | 478cc37 | |
---|---|---|
Author: | Arseny Sher | |
Committer: | Arseny Sher |
Propagate standby apply LSN to pageserver to hold off GC. To avoid pageserver gc'ing data needed by standby, propagate standby apply LSN through standby -> safekeeper -> broker -> pageserver flow and hold off GC for it. Iteration of GC resets the value to remove the horizon when standby goes away -- pushes are assumed to happen at least once between gc iterations. As a safety guard max allowed lag compared to normal GC horizon is hardcoded as 10GB. Add test for the feature. Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
Commit: | 9d984ed | |
---|---|---|
Author: | Konstantin Knizhnik | |
Committer: | Konstantin Knizhnik |
Replace latest with horizon in get_page request
Commit: | 8cfc5ad | |
---|---|---|
Author: | Konstantin Knizhnik | |
Committer: | Konstantin Knizhnik |
Propagate repply_flush_lsn from SK to PS to prevent GC from collecting objects which may be still requested by replica
Commit: | 613906a | |
---|---|---|
Author: | Arthur Petukhovsky | |
Committer: | GitHub |
Support custom types in broker (#5761) Old methods are unchanged for backwards compatibility. Added `SafekeeperDiscoveryRequest` and `SafekeeperDiscoveryResponse` types to serve as example, and also as a prerequisite for https://github.com/neondatabase/neon/issues/5471
Commit: | 73a2c9e | |
---|---|---|
Author: | Arthur Petukhovsky | |
Committer: | Arthur Petukhovsky |
WIP Safekeeper broker discovery
Commit: | e98580b | |
---|---|---|
Author: | Arseny Sher | |
Committer: | Arseny Sher |
Add term and http endpoint to broker messaged SkTimelineInfo. We need them for safekeeper peer recovery https://github.com/neondatabase/neon/pull/4875
Commit: | 814abd9 | |
---|---|---|
Author: | Arthur Petukhovsky | |
Committer: | GitHub |
Switch to safekeeper in the same AZ (#3883) Add a condition to switch walreceiver connection to safekeeper that is located in the same availability zone. Switch happens when commit_lsn of a candidate is not less than commit_lsn from the active connection. This condition is expected not to trigger instantly, because commit_lsn of a current connection is usually greater than commit_lsn of updates from the broker. That means that if WAL is written continuously, switch can take a lot of time, but it should happen eventually. Now protoc 3.15+ is required for building neon. Fixes https://github.com/neondatabase/neon/issues/3200
Commit: | 2d42f84 | |
---|---|---|
Author: | Arseny Sher | |
Committer: | Arseny Sher |
Add storage_broker binary. Which ought to replace etcd. This patch only adds the binary and adjusts Dockerfile to include it; subsequent ones will add deploy of helm chart and the actual replacement. It is a simple and fast pub-sub message bus. In this patch only safekeeper message is supported, but others can be easily added. Compilation now requires protoc to be installed. Installing protobuf-compiler package is fine for Debian/Ubuntu. ref https://github.com/neondatabase/neon/pull/2733 https://github.com/neondatabase/neon/issues/2394
Commit: | 3aad65e | |
---|---|---|
Author: | Arseny Sher | |
Committer: | Arseny Sher |
Add neon_broker binary. Which ought to replace neon_broker. This patch only adds the binary and adjusts Dockerfile to include it; subsequent ones will add deploy of helm chart and the actual replacement. It is a simple and fast pub-sub message bus. In this patch only safekeeper message is supported, but others can be easily added. Compilation now requires protoc to be installed. Installing protobuf-compiler package is fine for Debian/Ubuntu. ref https://github.com/neondatabase/neon/pull/2733 https://github.com/neondatabase/neon/issues/2394
Commit: | a40e1ed | |
---|---|---|
Author: | Arseny Sher | |
Committer: | Arseny Sher |
Tonic based broker for SkTimelineInfo.