Proto commits in lalithsuresh/rapid

These 47 commits are when the Protocol Buffers files have changed:

Commit:74d414b
Author:Lalith Suresh
Committer:GitHub

Revert "Simple anti-entropy mechanism (#24)" (#28) This reverts commit 0a1788f671ef6690b5bd3d97951578ede9cf693a.

The documentation is generated from this commit.

Commit:51bedce
Author:Lalith Suresh
Committer:GitHub

Revert "Simple anti-entropy mechanism (#24)" This reverts commit 0a1788f671ef6690b5bd3d97951578ede9cf693a.

The documentation is generated from this commit.

Commit:0a1788f
Author:Manuel Bernhardt
Committer:GitHub

Simple anti-entropy mechanism (#24) * Anti-entropy mechanism It can happen that a node misses a part of the consensus messages whilst still being able to send out its own vote (unidirectional network partition, message overload, ...). In this case, the rest of the group will see this node as being part of the group and the monitoring mechanism will still be working as expected, but the stale node will run an old configuration. In order to enforce consistency in this case, the following new anti-entropy mechanism is used: - each node maintains a set of configurations it has been part of - probe messages now contain the configuration ID of the observer - when a node receives a probe message with a configuration ID it does not know, it will start a background task to check again after a configured timeout (1 minute by default) - if the configuration ID is still unknown after the timeout has reached, the node leaves (using the LEAVE protocol) * Allows a node to catch up if it misses a consensus round * Removing leave when out-of-sync strategy, improve implementation

Commit:941aeb3
Author:Manuel Bernhardt
Committer:GitHub

Endpoint performance and memory pressure optimization (#19) This is a bit of a controversial change from the view point of the API, yet makes a lot of sense from the view point of performance and memory utilization for very large clusters. The issue here is that the hostname of an Endpoint is modelled as a protobuf "string" type. This type carries with it the overhead of encoding to or decoding from UTF-8 every time a message is sent or received (and the field accessed). From the view point of the algorithms in place, there's no added value in having the endpoint host data be encoded as byte array or utf-8 encoded string. It is just data, what matters is that the ordering of the endpoints can be established. Having a string only matters at the interfaces: when configuring a hostname, when sending a message to one and when printing log statements (most of which at DEBUG/TRACE level). Yet at the moment, when adding a new endpoint to the membership ring(s), the following code runs: ``` public java.lang.String getHostname() { java.lang.Object ref = hostname_; if (ref instanceof java.lang.String) { return (java.lang.String) ref; } else { com.google.protobuf.ByteString bs = (com.google.protobuf.ByteString) ref; java.lang.String s = bs.toStringUtf8(); hostname_ = s; return s; } } ``` For freshly received messages containing Endpoints, this means running `toStringUtf8()`, which when there are many is quite expensive in terms of CPU and memory usage. This PR does the following: - use `bytes` rather than `string` to encode the hostname in protobuf - adjust all interfaces - the Cluster APIs are actually (almost) not affected since they use the `HostAndPort` construct - use the existing underlying / existing byte array when computing the hashcode of an Endpoint in `Utils.AddressComparator` - getting rid of the mapping between `Map<String, Metadata>` and `Map<Endpoint, Metadata>` by representing the map as two lists in protobuf (keys and values) On local tests with 1000 concurrent nodes joining, there's a 10% improvement in memory allocation and a 20% improvement in CPU usage of the stack starting at the `TreeSet.add` method (39% vs 58%).

The documentation is generated from this commit.

Commit:b04d666
Author:Manuel Bernhardt
Committer:GitHub

Proactively informing observers when shutting down (#15) Rather than waiting for edge failure detection to kick in when a cluster has been shut down, this change proactively informs the observers of a node with a new Leaving message. In turn the observers the trigger edge failure alerting immediately. Adds Cluster.leaveGracefully() and Cluster.shutdown() APIs for graceful and forced shutdowns respectfully. Accessing membership state after either of these APIs are invoked is illegal and will result in a thrown exception. * Proactively informs observer nodes that the node is leaving when the cluster is shut down * Leave notifications delivered in parallel, call to leave() protected by try/finally * Fixing parallel leave message sending - tolerating failure in delivering the messages, i.e. not cancelling other notifications - adjusting test intervals in order to reach agreement faster * Throw exceptions when trying to access membership state after shutting down

Commit:bce1e27
Author:Lalith Suresh
Committer:GitHub

Terminology edits (#11) * Rename APIs to match observer -> subject terminology * WatermarkBuffer -> almost-everywhere agreement filter * Rename monitoring links -> monitoring edges * Use cut detection terminology

Commit:11f4b73
Author:lalithsuresh

Endpoint is now tagged with metadata

Commit:bfed8d4
Author:lalithsuresh
Committer:lalithsuresh

Endpoint protobuf type now represents each node to avoid back-and-forth conversions between strings and Guava HostAndPort

Commit:80738c9
Author:lalithsuresh
Committer:lalithsuresh

Add Classic Paxos implementation for recovering from Fast Paxos conflicts

Commit:e452ed0
Author:lalithsuresh
Committer:lalithsuresh

Refactor messaging interfaces to decouple Rapid from the messaging implementation

Commit:5271b37
Author:lalithsuresh

Cleanup interface boundaries for messaging

Commit:c119ef4
Author:lalithsuresh

Netty tests

Commit:0045ce4
Author:lalithsuresh

Metadata values are now ByteStrings

Commit:80e4acd
Author:lalithsuresh
Committer:lalithsuresh

Remove back-and-forth conversions for UUIDs

Commit:67c9bdf
Author:lalithsuresh
Committer:lalithsuresh

Use best effort broadcast and re-organize executor usage.

Commit:afd9b50
Author:lalithsuresh

Avoid creating redundant copies of link-update-messages

Commit:538e894
Author:lalithsuresh
Committer:lalithsuresh

Changes to the metadata API to avoid sending strings around

Commit:e6e4b81
Author:lalithsuresh

Batch join-messages for multiple rings that are directed to the same monitor

Commit:252b907
Author:lalithsuresh
Committer:lalithsuresh

Support informing ProbeMessage-based failure detectors about whether a monitoree is bootstrapping

Commit:5184c26
Author:lalithsuresh
Committer:lalithsuresh

Supply executors to prevent grpc's usage of a cachedThreadPool

Commit:856c3e4
Author:lalithsuresh

Revert changes to receiving join-confirmations

Commit:4cc8f8c
Author:lalithsuresh

Refactor join protocol to be retry friendly

Commit:7f89246
Author:lalithsuresh

Metadata manager now maintains a set of key-value pairs per-node

Commit:5b7f43b
Author:lalithsuresh

Cluster can now track metadata per-node. Confined to features like roles for now.

Commit:d4a1c07
Author:lalithsuresh

Refactor repository into a parent project with modules

Commit:3798363
Author:lalithsuresh
Committer:lalithsuresh

Consensus implementation

Commit:cf1f4e7
Author:lalithsuresh

Implement monitoring support

Commit:9c28c32
Author:lalithsuresh

Avoid proposal logging by default + nits

Commit:4423070
Author:lalithsuresh

Cleanup protobuf descriptions

Commit:72c122d
Author:lalithsuresh

Cleanup protobuf descriptions

Commit:91d82a4
Author:lalithsuresh
Committer:lalithsuresh

Refactor out redundant LinkUpdateMessage class. We only use the protobuf definition now.

Commit:5305e39
Author:lalithsuresh

Implement update batching

Commit:fe196f2
Author:lalithsuresh

Use InProcessChannel for tests.

Commit:af88219
Author:lalithsuresh

Join protocol works until a configuration change. Need to stream back configuration.

Commit:731faaa
Author:lalithsuresh

Refactor code to accommodate changes to bootstrap procedure

Commit:a13daf1
Author:lalithsuresh

Checkpoint before re-working MembershipView

Commit:1edb1e2
Author:lalithsuresh

Checkpoint before gossip implementation

Commit:9c9abbc
Author:lalithsuresh

Checkpoint before async implementation

Commit:00a1c2d
Author:lalithsuresh

Test bootstrap

Commit:c87a041
Author:lalithsuresh

Improve tests and hashing stability

Commit:da5dd04
Author:lalithsuresh

Part 1 of join protocol

Commit:2da809a
Author:lalithsuresh

Prepare to implement join protocol

Commit:5b73d2c
Author:lalithsuresh

Performance improvements to MembershipView

Commit:d86641a
Author:lalithsuresh

Introduce node-id maintenance

Commit:b6c2c62
Author:lalithsuresh

First take at messaging tests with a simple broadcaster

Commit:13a4bd5
Author:lalithsuresh

Split protobuf generated definitions into multiple files

Commit:1cf205a
Author:lalithsuresh

Transition to gRPC and remove checker framework