These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)
Commit: | bfd54c0 | |
---|---|---|
Author: | Ofer Gill | |
Committer: | Ofer Gill |
Adds deterministic id selection for BackupSelector - Adds the PlusOneBackupSelector, which starts with masterServerId + 1 as a backupServer candidate, and keeps incrementing one forward with wraparound until either finding a valid backup server, or running out of attempts and giving up. - PlusOneBackupSelector class behaves deterministically for selecting a backupServer like MinCopysetsBackupSelector, except it ignores replicationId. - We also add the corresponding --usePlusOneBackup commandline argument to ServerMain, similar to --useMinCopysets.
The documentation is generated from this commit.
Commit: | 4232cee | |
---|---|---|
Author: | Yilong Li | |
Committer: | Yilong Li |
Added option --hugepage to allow using hugepages to allocate LargeBlockOfMemory Using hugepage to back SegletAllocator::block improves the bandwidth of reading 1MB objects in the basic benchmark to the level of echo_basic since it reduces the address translation burden in the NIC (our ConnectX-2 card has very limited cache). The option is added to {ServerMain.cc, clusterperf.py, cluster.py}.
Commit: | ab4c003 | |
---|---|---|
Author: | Yilong Li | |
Committer: | Yilong Li |
Added protobuf syntax specification to all .proto files. Required by newer versions of the protobuf compiler.
Commit: | 7cba5e2 | |
---|---|---|
Author: | jspeiser |
Add cyclic replica buffer to BackupMasterRecovery. This fixes an issue where recovery used an unbounded amount of memory.
Commit: | 74ba2e1 | |
---|---|---|
Author: | Seo Jin Park |
Fixed bug: sends witnessListVersion with UnsyncedObjectRpcWrapper::send()
Commit: | 94e654c | |
---|---|---|
Author: | Seo Jin Park |
Implemented Witness w/o recovery Most parts implemented except the recovery. There is a performance issue, should be digged more. Witness assignment in Witnessmanager should be changed.
Commit: | 09694f3 | |
---|---|---|
Author: | Seo Jin Park |
Decouple TabletManager::TabletState from Tablets.proto We don't really need proto buffer for TabletState since it is master's local state. The Tablets.proto is actually used for serialization / deserialization for coordinator andexternal storage.
Commit: | 6191b4c | |
---|---|---|
Author: | John Ousterhout |
Schedule worker threads based on RPC levels, not services. This changes the basic worker thread scheduling mechanism to eliminate the potential for distributed deadlock, which was there previously. The new system does not support per-service limits on concurrency; it only supports a single overall thread limit. The --masterServiceThreads command-line option has been replaced with --maxCores (which includes the dispatch thread as well as worker threads). Similarly, for the coordinator, the old --threads option has been replaced with --maxCores. Note: this means that all services are now concurrent, whereas in the past some services had a concurrency limit of 1; this could expose latent threading issues that were hidden previously by the lack of concurrency.
Commit: | 5ec32f3 | |
---|---|---|
Author: | William Sheu | |
Committer: | William Sheu |
Client now gets indexlet backing table information.
Commit: | bef6349 | |
---|---|---|
Author: | Collin Lee | |
Committer: | Collin Lee |
Fix minor issues from #1 code review.
Commit: | 6f17a97 | |
---|---|---|
Author: | Collin Lee | |
Committer: | Collin Lee |
Add complete LeaseManager implementation and tests.
Commit: | ae48290 | |
---|---|---|
Author: | Henry Qin | |
Committer: | Henry Qin |
Fixing the deadlock problem between migrateTablet and write RPC's.
Commit: | 303ceb0 | |
---|---|---|
Author: | Collin Lee | |
Committer: | Collin Lee |
Add LeaseManager::recover.
Commit: | 3351086 | |
---|---|---|
Author: | Collin Lee | |
Committer: | Collin Lee |
Simple LeaseManager stub.
Commit: | f2cfcfc | |
---|---|---|
Author: | Ankita Kejriwal | |
Committer: | Ankita Kejriwal |
Add functionality to reassign indexlet and backing table on the coordinator, AND Modify Table protobuf to store normal index-related metadata, AND Fix other lint / naming issues. Change metadata on coordinator to reflect change of ownership and contact the old owner to drop ownership and the new server to take ownership. This is also logged in external storage. Modify Table protobuf (that is logged in external storage) to have messages to store normal index metadata (though this is not yet being used — to be done in a future commit), as well as information about the indexlet (and corresponding backing table) being reassigned.
Commit: | c7ff1b4 | |
---|---|---|
Author: | Ankita Kejriwal | |
Committer: | Ankita Kejriwal |
Random improvements in Indexing code. 1. Replaced Indexlets.proto (which was a set of Indexlet messages) with Indexlet.proto (which defines an individual indexlet) as the set defined in Indexlets.proto was never used. 2. Renamed indexletTableId to backingTableId to make the name more unambiguous. 3. Indentation and documentation improvements.
Commit: | a058c52 | |
---|---|---|
Author: | Ankita Kejriwal | |
Committer: | Ankita Kejriwal |
Removing two wrapper functions from IndexletManager & other changes. IndexletManager had wrapper functions for addIndexlet() and deleteIndexlet() that parsed Indexlets protobuf. Moving the parsing to calling functions in MasterService. Also changing the names of two variables in Indexlets protobuf to be consistent with naming at other places in the code. Improving some documentation.
Commit: | 2b37671 | |
---|---|---|
Author: | Ankita Kejriwal | |
Committer: | Ankita Kejriwal |
Nit: Modifying variable name in a protobuf.
Commit: | 40f7871 | |
---|---|---|
Author: | Ankita Kejriwal | |
Committer: | Ankita Kejriwal |
A round of post-deadline cleanup of SLIK code. Based on / inspired by code review done by John on SLIK code (originally written by various members of SLIK team). This round of changes is mostly documentation improvement, code simplification, copyright updates and small bug fixes.
Commit: | 6726c50 | |
---|---|---|
Author: | Zhihao Jia |
solve conflict
Commit: | da4bf8d | |
---|---|---|
Author: | Collin Lee | |
Committer: | Collin Lee |
Add '--allowLocalBackup' flag to masters. Masters can now be started with the '--allowLocalBackup' command-line flag to allow masters to use backups located on the same machine. Using this flag RAMCloud can be started and used on a single machine with a replication factor of 1. **Note that this is not the recommended configuration of RAMCloud.**
Commit: | de3d872 | |
---|---|---|
Author: | Zhihao Jia |
Add new Protocol Buffer message RecoveryPartition into repo. The purpose of RecoveryPartition is to provide communication protocol between coordinator and recoveryMaster during crash recovery. RecoveryPartition goes both directions during a recovery, which means coordinator sends it to recoveryMaster, and recoveryMaster will send it back to coordinator.
Commit: | 1c2f2bc | |
---|---|---|
Author: | Zhihao Jia |
make highestBTreeIdMap a local variable in MasterService
Commit: | 1560ae3 | |
---|---|---|
Author: | Zhihao Jia |
Implement new protocol buffer for recovery messages
Commit: | 6160952 | |
---|---|---|
Author: | Zhihao Jia |
Add indexlet list into Tablets.proto. Current Coordinator can send indexlet information to RecoveryMaster
Commit: | 68263bb | |
---|---|---|
Author: | Ashish Gupta |
Object Finder updated to lookup indexes. Better indexlet allocation to avoid leak of information.
Commit: | bbbcc86 | |
---|---|---|
Author: | Stephen M. Rumble |
Merge branch 'master' of git+ssh://fiz.stanford.edu/git/ramcloud Conflicts: src/BackupStorage.cc src/ServiceManager.cc
Commit: | 5ac34eb | |
---|---|---|
Author: | John Ousterhout |
Reworked CoordinatorServerList to store a list of pending updates in each entry and include this in the information written to external storage. This is needed to handle situations where there are multiple pending updates for single server at the time the coordinator crashes.
Commit: | f11f5cb | |
---|---|---|
Author: | John Ousterhout | |
Committer: | John Ousterhout |
Provide default values for entries, so CoordinatorServerList.cc doesn't have to initialize explicitly (fixes RAM-570).
Commit: | 4acc2b6 | |
---|---|---|
Author: | John Ousterhout |
Fixes RAM-564 (added new method TableManager::splitRecoveringTablet).
Commit: | 73008d6 | |
---|---|---|
Author: | John Ousterhout | |
Committer: | John Ousterhout |
Major refactoring of CoordinatorServerList to use ExternalStorage for crash recovery. Also did a lot of internal cleaning up.
Commit: | af65ae8 | |
---|---|---|
Author: | John Ousterhout | |
Committer: | John Ousterhout |
Removed all remaining LogCabin-related code from RAMCloud. Note: with this commit, RAMCloud compiles but will not pass tests or run anything interesting (among other things, enlistServer has been gutted).
Commit: | 24aa119 | |
---|---|---|
Author: | John Ousterhout | |
Committer: | John Ousterhout |
Major overhaul of TableManager: switched to use ExternalStorage for information that must survive crashes, instead of LogCabin.
Commit: | 1ff0905 | |
---|---|---|
Author: | John Ousterhout | |
Committer: | John Ousterhout |
Added CoordinatorUpdateManager class: tracks sequence numbers for recoverable coordinator operations.
Commit: | e2771a1 | |
---|---|---|
Author: | John Ousterhout | |
Committer: | John Ousterhout |
Major refactoring of CoordinatorServerList to use ExternalStorage for crash recovery. Also did a lot of internal cleaning up.
Commit: | 5357384 | |
---|---|---|
Author: | John Ousterhout |
Removed all remaining LogCabin-related code from RAMCloud. Note: with this commit, RAMCloud compiles but will not pass tests or run anything interesting (among other things, enlistServer has been gutted).
Commit: | 9dcf624 | |
---|---|---|
Author: | John Ousterhout | |
Committer: | John Ousterhout |
Major overhaul of TableManager: switched to use ExternalStorage for information that must survive crashes, instead of LogCabin.
Commit: | db1cca6 | |
---|---|---|
Author: | Stephen M. Rumble |
Checkpoint an enormous amount of changes and bug fixes that I'll need to eventually clean up and push.
Commit: | 0b5adab | |
---|---|---|
Author: | John Ousterhout | |
Committer: | John Ousterhout |
Added CoordinatorUpdateManager class: tracks sequence numbers for recoverable coordinator operations.
Commit: | 61342da | |
---|---|---|
Author: | Stephen M. Rumble |
Merge branch 'master' of git+ssh://fiz.stanford.edu/git/ramcloud
Commit: | c79ee95 | |
---|---|---|
Author: | Stephen M. Rumble |
Add metric to track number of completely empty segments cleaned. This doesn't really matter for our current cleaning procedure, but it lets us more accurately calculate what it would have cost were were to read segments from backups when cleaning.
Commit: | f20ca4f | |
---|---|---|
Author: | Asaf Cidon |
Moved replication id functions into a new class: ServerReplicationUpdate. Made necessary changes to LogCabin and Coordinator Recovery code.
Commit: | 82351fd | |
---|---|---|
Author: | Asaf Cidon |
Merge branch 'master' of ssh://fiz.stanford.edu/git/ramcloud
Commit: | 7b2335b | |
---|---|---|
Author: | Asaf Cidon |
Merge remote branch 'origin/master' Conflicts: src/CoordinatorServerList.cc src/CoordinatorServerList.h
Commit: | 86edf68 | |
---|---|---|
Author: | Asaf Cidon |
Added Coordinator support for MinCopysets
Commit: | ed28565 | |
---|---|---|
Author: | Arjun Gopalan |
made splitTablet operation idempotent on coordinator and master. API changed such that startKeyHash and endKeyHash are no longer necessary.
Commit: | d946fc5 | |
---|---|---|
Author: | Stephen M. Rumble |
Merge branch 'master' of git+ssh://fiz.stanford.edu/git/ramcloud Conflicts: src/MakefragTest
Commit: | af2935d | |
---|---|---|
Author: | Stephen M. Rumble |
More log/cleaner metrics, make Histogram::toString() trivially parsable.
Commit: | 16dff49 | |
---|---|---|
Author: | Stephen M. Rumble | |
Committer: | Stephen M. Rumble |
- Add disk I/O thottling to backups. It's not super-smart, but it works under heavy I/O (basically each write operation is delayed if it occurs faster than it should have given the constraint). Backups now have a --backupWriteRateLimit <MB/s> parameter. - Fix two cost-benefit bugs. One was that we weren't substracting the same amount of space when freeing as when appending (appends included segment metaoverhead, whereas frees did not). Second, some refactoring introduced a variable that shadowed a class member, so we weren't actually updating the spaceTime field at all. All segments appeared to have age 0. I've now screwed this up three times in a year or two. It's time to add some better tests. Hopefully ones that survive refactors. - Add metrics to take note of segments that were freed before ever having been synced to disk. This is correct behaviour, but can lead to interesting results (disks appear faster than they are because the cleaner is cleaning segments that never made it to disk).
Commit: | d97d566 | |
---|---|---|
Author: | Stephen M. Rumble |
- Add a few more metrics and abstract out the printing of cleaner metrics into a separate class so that other utilities (or clients) can make use of it. - Revert the cleaner's new balancing algorithm. It works great in some cases, but not so well in others, it turns out. Go with a simpler approach that is easier to reason about. It still doesn't quite get the desired balance in some cases (large objects, high utilisation), but adding an additional compactor thread appears to rectify that. Time will tell...
Commit: | c1fa0a6 | |
---|---|---|
Author: | Asaf Cidon |
Revert "Added LogCabin support for MinCopysets." This reverts commit 4374dca891f4adff03f7873e3c60b562bbcfdd60.
Commit: | 4ae1993 | |
---|---|---|
Author: | Stephen M. Rumble |
Add whole-log metrics to track what types of objects are in all of the segments.
Commit: | 914427e | |
---|---|---|
Author: | Asaf Cidon |
Merge branch 'master' of ssh://fiz.stanford.edu/git/ramcloud
Commit: | 4374dca | |
---|---|---|
Author: | Asaf Cidon |
Added LogCabin support for MinCopysets.
Commit: | e7ef7c5 | |
---|---|---|
Author: | Ankita Kejriwal |
Ensure unquie (and strictly increasing) tableIds are assigned even in face of coordinator crashes.
Commit: | 9315f48 | |
---|---|---|
Author: | Stephen M. Rumble |
- Compact segments that have a lot of tombstones and haven't been compacted recently when there are no better candidates (ones with dead object space). Tracking when tombstone space is free isn't easy, so do this instead. - Add a standalone memory compaction benchmark (CleanerCompactionBenchmark) that is very similar to RecoverSegmentBenchmark. This makes it much easier to measure and fine tune the cleaner.
Commit: | cd37639 | |
---|---|---|
Author: | Stephen M. Rumble |
More metrics.
Commit: | e623627 | |
---|---|---|
Author: | Stephen M. Rumble |
Add more metrics to track not only the number of each log entry type we scan during cleaning, but their cumulative sizes.
Commit: | a367ea6 | |
---|---|---|
Author: | Stephen M. Rumble |
Add some more cleaner metrics.
Commit: | bad7231 | |
---|---|---|
Author: | Stephen M. Rumble |
Add more cleaner metrics.
Commit: | 2df4ab5 | |
---|---|---|
Author: | Stephen M. Rumble |
Track time distribution of how many threads are simultaneously running in the cleaner.
Commit: | 1d16a8a | |
---|---|---|
Author: | Stephen M. Rumble |
Add a --maxNonVolatileBuffers parameter to backups. This specifies the maximum number of segments that may be queued up in memory. The default is still effectively unlimited (equal to the --segmentFrames parameter), but this default has the unfortunate property that backups can exceed a machine's free memory capacity when under heavy load and get slain by the OOM killer. While here, change a few NOTICE prints to DEBUG when backups reject open requests. This is to avoid a torrent of console spam when backups are heavily loaded.
Commit: | 88065f9 | |
---|---|---|
Author: | Stephen M. Rumble |
Add a few more cleaner metrics.
Commit: | 8189565 | |
---|---|---|
Author: | Stephen M. Rumble |
- Fix another cleaner bug: we used to first wait until sufficient survivor segments were available before starting to clean. Unfortunately there was a minor bug that caused our wait method to return, erroneously promising availability of segments. The subsequent allocation would fail. This has been fixed with a simpler mechanism. We simply block in the survivor allocation method until one is available. - Add contention counters to SpinLocks, as well as (optional) names. Make every SpinLock instantiation link into a global list that we can traverse in order to populate a ProtoBuf and suck down these stats remotely. Now MasterService::getServerStatistics() returns these and LogCleanerBenchmark emits the most-contended locks.
Commit: | 0cbfd91 | |
---|---|---|
Author: | Stephen M. Rumble |
Add commandline options to specify the max number of threads to run in MasterService and the cleaner. The defaults are still 1.
Commit: | 7fdbf2e | |
---|---|---|
Author: | Ankita Kejriwal |
Persist server list version number across coordinator failures when there are no outstanding updates to inform the new coordinator about the highest version number-ed update sent out to the cluster before coordinator crash.
Commit: | bc4c8be | |
---|---|---|
Author: | Ankita Kejriwal |
Coordinator now retains update version numbers across failures.
Commit: | 64b29d6 | |
---|---|---|
Author: | Ankita Kejriwal |
In CoordinatorServerList, handling various states of a crashed server. When a server crashes, its recovery and cleanup happens as follows: 1. ServerNeedsRecovery: This server's recovery has not been completed, and needs to be (re-)started when its crash is being handled. 2. ServerCrashed: Its CRASH update is sent out to the cluster and recovery started if its entry indicates that it needs recovery (set in previous step). 3. ServerRecoveryCompleted: The server's recovery has completed. It now needs its REMOVE updates to be sent out to the cluster. Each of these steps are implemented as classes that store data and provide required functions. They persist needed information to LogCabin, and have corresponding recovery functions to recover that operation. This change is motivated by the need to persist server list update version numbers across coordinator failures. The remainder of the code to handle that will be committed soon. (Also fixed some minor latent bugs and style issues discovered along the way.)
Commit: | 25c85f1 | |
---|---|---|
Author: | Ankita Kejriwal |
Trying to make terms related to a crashed servers unambiguous. Basically, a server "crashes", and after its recovery, cluster-wide updates, and cleanup, it is "removed". More details: If a server is not responding, it is detected as "crashed" (as opposed to "down" earlier). This calls hintServerCrashed() in the Coordinator Client (as opposed to hintServerDown()), which also has correspondingly renamed WireFormat opcode. Similarly, on the Coordinator (in CSL), the serverCrashed() (as opposed to serverDown()) starts recovery, sends all required updates to the cluster, and removes server from server list. The status of a server in server list is marked as "CRASHED" when it has crashed and "REMOVE" once it has been successfully recovered (and hence removed from the cluster), and any data (example, replicas on backups) related to it can also be removed.
Commit: | d177926 | |
---|---|---|
Author: | Stephen Yang |
Merge branch 'master' of ssh://fiz.stanford.edu/git/ramcloud Conflicts: src/CoordinatorServerList.cc src/MasterRecoveryManagerTest.cc
Commit: | 33c1d7f | |
---|---|---|
Author: | Stephen Yang |
== Current == Rewrite of the Coordinator's Server List Update mechanism. It is now more understandable with copious amounts of documentation and adds more FT in DCFT for the Coordinator. It is also written in a way that can be easily separated from the main Coordinator (TODO). == Issues == Fixes RAM-484 - The mechanism for updateSlots has been completely rewritten. Fixes RAM-487 - Assert is removed and isClusterUpToDate has been rewritten. The old check was intended to ensure that the updater thread never sleeps when there are updates. The new equivalent prevents sleeping when there's more work and lockups are only transient. == Future == The highest priority is to allow the Coordinator to batch updates. This will ameliorate some jira issues. The new updater has the framework to allow this but it has not been completely implemented yet. The lowest priority is to split the updater into its own class. Again, the framework has been setup, someone just has to grind through it.
Commit: | f0ab50e | |
---|---|---|
Author: | Ankita Kejriwal |
For coordinator operation splitTablet(), doing the following: 1. Persist state information in LogCabin. 2. Provide separate API calls for normal operation and for coordinator recovery. 3. Internally implement as classes that store required information and execute / complete the operation as needed during normal operation or recovery.
Commit: | 366fbbc | |
---|---|---|
Author: | Ankita Kejriwal | |
Committer: | Ankita Kejriwal |
For Coordinator operation tabletRecovered(), doing the following: 1. Persist state information in LogCabin. 2. Provide separate API calls for normal operation and for coordinator recovery. 3. Internally implement as classes that store required information and execute / complete the operation as needed during normal operation or recovery.
Commit: | d56c004 | |
---|---|---|
Author: | Ankita Kejriwal | |
Committer: | Ankita Kejriwal |
For Coordinator operations to add and drop table, doing the following: 1. Persist state information in LogCabin. 2. Provide separate API calls for normal operation and for coordinator recovery. 3. Internally implement as classes that store required information and execute / complete the operation as needed during normal operation or recovery.
Commit: | 15e7fbc | |
---|---|---|
Author: | Ankita Kejriwal |
Moving hintServerDown to CoordinatorService where it makes logical sense. CoordinatorServerList has method serverDown (renamed from forceServerDown) that is in turn called by hintServerDown after CoordinatorService verifies that the server is actually down. (Note that serverDown is also called by enlistServer when the enlisting server replaces an older crashed server.)
Commit: | 163f73e | |
---|---|---|
Author: | Ankita Kejriwal |
Merge branch 'master' of ssh://fiz.stanford.edu/git/ramcloud Conflicts: src/CoordinatorServerManager.cc src/CoordinatorServerManagerTest.cc src/MasterService.cc
Commit: | e1ce425 | |
---|---|---|
Author: | Ankita Kejriwal |
Moved server mgmt code from CoordinatorServerManager to CoordinatorServerList. This fixes RAM-454. Some other changes: - Resolves segmentation fault in CoordinatorServerList caused by generateUniqueId() function by holding a lock / requiring the caller to hold a lock. This fixes RAM-478. - MasterRecoveryManager method startMasterRecovery() now takes the entry corresponding to the server to be recovered rather than just its id. This change has been made to prevent a deadlock. (Deadlock scenario: When a server having MasterService crashes, CoordinatorServerList::forceServerDown() calls MasterRecoveryManager:: startMasterRecovery(). This in turn calls back into CoordinatorServerList to retrieve the entry corresponding to the serverId of the crashed server. Since both of these CoordinatorServerList methods need a lock, it deadlocks.) Some minor changes: - Small fixes of typos / other lint where I discovered them. - CoordinatorServerList doesn't have to be compiled on clients. - StateServerDown protobuf is now ForceServerDown.
Commit: | 296a54b | |
---|---|---|
Author: | Asaf Cidon |
Added support for the MinCopysets BackupSelector. Here's a summarized list of changes: - I virtualized the selectSecondary function in Backup Selector - Added a command line argument for useMinCopysets (the default is false) - Added the MinCopysetsBackupSelector that looks for a free backup with the same replicationId as the primary replica. Still need to add to add support for MinCopysets in handleBackupFailures and test everything together.
Commit: | b237f5f | |
---|---|---|
Author: | Stephen M. Rumble |
Merge branch 'master' of ssh://fiz.stanford.edu/git/ramcloud Conflicts: src/BackupSelector.cc src/BackupServiceTest.cc src/CoordinatorMain.cc src/InfRcTransport.h src/Log.cc src/LogTest.cc src/Makefrag src/MasterServiceTest.cc src/ReplicaManagerTest.cc src/ReplicatedSegment.cc src/ReplicatedSegmentTest.cc src/Segment.cc src/Segment.h src/SegmentIterator.cc src/SegmentManager.cc src/SegmentManager.h src/SegmentTest.cc src/ServerMain.cc
Commit: | b493707 | |
---|---|---|
Author: | Stephen M. Rumble |
Keep track of the distribution of the number of segments on disk. Also, dump server statistics after prefill as well as after the complete benchmark in LogCleanerBenchmark.
Commit: | 5f0b0b6 | |
---|---|---|
Author: | Asaf Cidon |
Propagate the replicationId from the CoordinatorServerList to all the ServerLists.
Commit: | 8ab71e2 | |
---|---|---|
Author: | Ryan Stutsman | |
Committer: | Ryan Stutsman |
Recovery from lost open replicas no longer involves log module. When replicas of open segments may have been lost the replica manager increments an "epoch" number associated with the segment. It then tries to contact all existing (unclosed) replicas to tag them with the new epoch number and recreates a new replica with the new epoch number. It then updates the replication epoch which consists of the open segment id and epoch number on the coordinator. During recovery the coordinator can use the segment id and epoch number to filter out any invalid replicas. This decouples the backup recovery logic from the log module because there is no need to roll over to a new log head with this new technique. Open segments remain open even after experiencing and recovering from failures.
Commit: | 520a71a | |
---|---|---|
Author: | Stephen M. Rumble |
Merge commit '8d04ef4ac5e860b1ac79c296f1ef09cf2698c359' Conflicts: src/BackupService.cc src/Log.cc src/Log.h src/LogCleaner.cc src/LogCleaner.h src/LogIteratorTest.cc src/LogTest.cc src/MasterService.h src/ReplicaManagerTest.cc src/SegmentManager.cc src/SegmentManager.h src/SegmentManagerTest.cc src/Server.cc src/TransportManager.cc
Commit: | d5118c4 | |
---|---|---|
Author: | Stephen Yang |
Changed the Coordinator Server List Updater to handle multiple RPCs at a time. Changed the scripts to run on infrc instead of fast+udp at the discretion of John for RAM-460. Fixes RAM-453 Fixes RAM-458 -> Removed the getServerList rpc and corresponding call in FailureDetector
Commit: | 52d3993 | |
---|---|---|
Author: | Stephen M. Rumble |
Print various metrics after the log cleaner benchmark completes.
Commit: | 44450b9 | |
---|---|---|
Author: | Stephen M. Rumble |
Add more log metrics. Break out hash table's histogram into common file, make it (de)serialisable via protobufs. LogCleanerBenchmark now uses asynchronous RPCs and can pipeline a configurable number of writes to more heavily load the server.
Commit: | c0c9248 | |
---|---|---|
Author: | Ryan Stutsman | |
Committer: | Ryan Stutsman |
Rework/re-enable benchmark code. Backups no longer perform write benchmarks. It wasn't ever used and it isn't clear where they should perform the benchmark since their storage may contain actual data. If we want it later it shouldn't be too hard to add back in in a more sensible way.
Commit: | b5f32e8 | |
---|---|---|
Author: | Stephen M. Rumble |
Add a GET_LOG_METRICS rpc and a new protocol buffer that contains metrics from the Log, LogCleaner, LogSegment, SegmentManager, and SegletAllocator modules. This gives clients a big window into the inner workings of the log during operation. The regular metrics mechanism wasn't used because it is a little too basic in that it only supports fixed integer fields. We will likely want dynamic counters (e.g. per-segment) and very large pieces of statistics like histograms that will be vectors of integers.
Commit: | 49c7cd4 | |
---|---|---|
Author: | Stephen M. Rumble |
Add a GetServerConfig RPC that returns a protobuf with the contents of the server's ServerConfig class. This lets clients peek at the runtime configuration of servers. Add a handful of new runtime options for the cleaner. Thread the config object down into various constructors to pull config options, rather than manually extracting the needed fields in different layers. Perhaps the global server config should be a member of context...
Commit: | ea6ca1c | |
---|---|---|
Author: | Ankita Kejriwal |
Adding LogCabin helper method to get the entry type of a given LogCabin entry.
Commit: | 5fbb36d | |
---|---|---|
Author: | Ankita Kejriwal |
This, along with the previous commit, fixes RAM-456. Adding a new protobuf ServerUpdate, removing extra fields from ServerInformation. Modifying server related operations according to RAM-456.
Commit: | 03057b6 | |
---|---|---|
Author: | Ankita Kejriwal | |
Committer: | Ankita Kejriwal |
Merging StateEnlistServer and ServerInformation protobufs. These protobufs stored identical information (after commit c269643). For simplicity, this commit removes StateEnlistServer protobuf. The state is stored in ServerInformation while starting to enlist a server with entry_type "ServerEnlisting", and after it has successfully enlisted with entry_type "ServerEnlisted".
Commit: | 3d92502 | |
---|---|---|
Author: | Ankita Kejriwal |
For enlist server operations in which the enlisting server replaces another server, the logging (in LogCabin) for forcing out the server being replaced and for enlisting the new server is now done separately.
Commit: | 0576d48 | |
---|---|---|
Author: | Ankita Kejriwal |
Renaming the protobuf StateHintServerDown to StateServerDown. This is because logging this state logically means that we want to force out a server, i.e., do serverDown(), rather than suspect that a server is down, i.e., hintServerDown().
Commit: | 5e49323 | |
---|---|---|
Author: | Stephen M. Rumble |
Close RAM-437 by making WallTime.cc's methods static parts of a WallTime class. Add a few unit tests while here to sanity check the clock and it's conversion to unix time.
Commit: | f8e598f | |
---|---|---|
Author: | Ryan Stutsman |
Added missing file from 3b65a2.
Commit: | 50bac37 | |
---|---|---|
Author: | Ankita Kejriwal |
Reverting commit that introduced recovery path for enlistServer. This will hopefully make it simpler for Stephen to add the asynchronous updates code, and I will add this functionality back after that. This reverts commit d62fb9257c5dea228c161652de1b5053931c7052. Conflicts: src/CoordinatorServerManagerTest.cc
Commit: | d62fb92 | |
---|---|---|
Author: | Ankita Kejriwal | |
Committer: | Ankita Kejriwal |
Changing enlistServer code in CoordinatorServerManager. 1. There are separate functions in the API for normal operation and recovery. 2. Adding monitor-style locking for thread safety.
Commit: | 3dbe775 | |
---|---|---|
Author: | Ankita Kejriwal |
Modifying HintServerDown logging according to new logging semantics with ServerInformation ProtoBuf.
Commit: | 57fbb8a | |
---|---|---|
Author: | Ankita Kejriwal | |
Committer: | Ankita Kejriwal |
SetMinOpenSegmentId now uses the new ServerInformation ProtoBuf to log state.