Proto commits in PlatformLab/RAMCloud

These commits are when the Protocol Buffers files have changed: (only the last 100 relevant commits are shown)

2019-10-03

Commit:	bfd54c0
Author:	Ofer Gill	2019-09-25 12:48:00 -0600
Committer:	Ofer Gill	2019-10-03 15:21:19 -0600

Adds deterministic id selection for BackupSelector - Adds the PlusOneBackupSelector, which starts with masterServerId + 1 as a backupServer candidate, and keeps incrementing one forward with wraparound until either finding a valid backup server, or running out of attempts and giving up. - PlusOneBackupSelector class behaves deterministically for selecting a backupServer like MinCopysetsBackupSelector, except it ignores replicationId. - We also add the corresponding --usePlusOneBackup commandline argument to ServerMain, similar to --useMinCopysets.

The documentation is generated from this commit.

2018-11-24

Commit:	4232cee
Author:	Yilong Li	2018-11-22 00:31:30 -0800
Committer:	Yilong Li	2018-11-24 14:18:53 -0800

Added option --hugepage to allow using hugepages to allocate LargeBlockOfMemory Using hugepage to back SegletAllocator::block improves the bandwidth of reading 1MB objects in the basic benchmark to the level of echo_basic since it reduces the address translation burden in the NIC (our ConnectX-2 card has very limited cache). The option is added to {ServerMain.cc, clusterperf.py, cluster.py}.

2018-07-07

Commit:	ab4c003
Author:	Yilong Li	2018-07-07 15:33:55 -0700
Committer:	Yilong Li	2018-07-07 15:45:42 -0700

Added protobuf syntax specification to all .proto files. Required by newer versions of the protobuf compiler.

2017-06-01

Commit:	7cba5e2
Author:	jspeiser	2017-06-01 14:18:22 -0700

Add cyclic replica buffer to BackupMasterRecovery. This fixes an issue where recovery used an unbounded amount of memory.

2017-02-12

Commit:	74ba2e1
Author:	Seo Jin Park	2017-02-12 03:19:00 -0800

Fixed bug: sends witnessListVersion with UnsyncedObjectRpcWrapper::send()

2017-01-28

Commit:	94e654c
Author:	Seo Jin Park	2017-01-28 02:03:49 -0800

Implemented Witness w/o recovery Most parts implemented except the recovery. There is a performance issue, should be digged more. Witness assignment in Witnessmanager should be changed.

2016-10-14

Commit:	09694f3
Author:	Seo Jin Park	2016-10-13 20:55:29 -0700

Decouple TabletManager::TabletState from Tablets.proto We don't really need proto buffer for TabletState since it is master's local state. The Tablets.proto is actually used for serialization / deserialization for coordinator andexternal storage.

2015-09-14

Commit:	6191b4c
Author:	John Ousterhout	2015-09-14 15:21:43 -0700

Schedule worker threads based on RPC levels, not services. This changes the basic worker thread scheduling mechanism to eliminate the potential for distributed deadlock, which was there previously. The new system does not support per-service limits on concurrency; it only supports a single overall thread limit. The --masterServiceThreads command-line option has been replaced with --maxCores (which includes the dispatch thread as well as worker threads). Similarly, for the coordinator, the old --threads option has been replaced with --maxCores. Note: this means that all services are now concurrent, whereas in the past some services had a concurrency limit of 1; this could expose latent threading issues that were hidden previously by the lack of concurrency.

2015-08-14

Commit:	5ec32f3
Author:	William Sheu	2015-07-06 15:11:41 -0700
Committer:	William Sheu	2015-08-14 14:25:19 -0700

Client now gets indexlet backing table information.

2015-04-10

Commit:	bef6349
Author:	Collin Lee	2015-04-09 17:53:44 -0700
Committer:	Collin Lee	2015-04-09 18:00:38 -0700

Fix minor issues from #1 code review.

2015-03-02

Commit:	6f17a97
Author:	Collin Lee	2014-09-19 11:21:05 -0700
Committer:	Collin Lee	2015-03-01 17:31:43 -0800

Add complete LeaseManager implementation and tests.

2015-01-29

Commit:	ae48290
Author:	Henry Qin	2015-01-13 15:23:53 -0800
Committer:	Henry Qin	2015-01-28 19:34:15 -0800

Fixing the deadlock problem between migrateTablet and write RPC's.

2014-10-31

Commit:	303ceb0
Author:	Collin Lee	2014-10-10 15:16:37 -0700
Committer:	Collin Lee	2014-10-31 14:31:27 -0700

Add LeaseManager::recover.

Commit:	3351086
Author:	Collin Lee	2014-10-09 17:27:04 -0700
Committer:	Collin Lee	2014-10-31 14:31:26 -0700

Simple LeaseManager stub.

2014-10-21

Commit:	f2cfcfc
Author:	Ankita Kejriwal	2014-09-25 12:00:32 -0700
Committer:	Ankita Kejriwal	2014-10-21 13:11:53 -0700

Add functionality to reassign indexlet and backing table on the coordinator, AND Modify Table protobuf to store normal index-related metadata, AND Fix other lint / naming issues. Change metadata on coordinator to reflect change of ownership and contact the old owner to drop ownership and the new server to take ownership. This is also logged in external storage. Modify Table protobuf (that is logged in external storage) to have messages to store normal index metadata (though this is not yet being used — to be done in a future commit), as well as information about the indexlet (and corresponding backing table) being reassigned.

Commit:	c7ff1b4
Author:	Ankita Kejriwal	2014-09-03 18:53:39 -0700
Committer:	Ankita Kejriwal	2014-10-21 13:11:53 -0700

Random improvements in Indexing code. 1. Replaced Indexlets.proto (which was a set of Indexlet messages) with Indexlet.proto (which defines an individual indexlet) as the set defined in Indexlets.proto was never used. 2. Renamed indexletTableId to backingTableId to make the name more unambiguous. 3. Indentation and documentation improvements.

2014-08-07

Commit:	a058c52
Author:	Ankita Kejriwal	2014-08-01 14:29:24 -0700
Committer:	Ankita Kejriwal	2014-08-06 17:37:54 -0700

Removing two wrapper functions from IndexletManager & other changes. IndexletManager had wrapper functions for addIndexlet() and deleteIndexlet() that parsed Indexlets protobuf. Moving the parsing to calling functions in MasterService. Also changing the names of two variables in Indexlets protobuf to be consistent with naming at other places in the code. Improving some documentation.

2014-07-26

Commit:	2b37671
Author:	Ankita Kejriwal	2014-07-23 16:16:54 -0700
Committer:	Ankita Kejriwal	2014-07-25 18:29:58 -0700

Nit: Modifying variable name in a protobuf.

2014-07-23

Commit:	40f7871
Author:	Ankita Kejriwal	2014-07-22 17:48:58 -0700
Committer:	Ankita Kejriwal	2014-07-22 17:51:57 -0700

A round of post-deadline cleanup of SLIK code. Based on / inspired by code review done by John on SLIK code (originally written by various members of SLIK team). This round of changes is mostly documentation improvement, code simplification, copyright updates and small bug fixes.

2014-06-10

Commit:	6726c50
Author:	Zhihao Jia	2014-06-10 01:27:37 -0700

solve conflict

2014-05-20

Commit:	da4bf8d
Author:	Collin Lee	2014-05-20 12:29:32 -0700
Committer:	Collin Lee	2014-05-20 15:38:52 -0700

Add '--allowLocalBackup' flag to masters. Masters can now be started with the '--allowLocalBackup' command-line flag to allow masters to use backups located on the same machine. Using this flag RAMCloud can be started and used on a single machine with a replication factor of 1. **Note that this is not the recommended configuration of RAMCloud.**

Commit:	de3d872
Author:	Zhihao Jia	2014-05-19 20:04:06 -0700

Add new Protocol Buffer message RecoveryPartition into repo. The purpose of RecoveryPartition is to provide communication protocol between coordinator and recoveryMaster during crash recovery. RecoveryPartition goes both directions during a recovery, which means coordinator sends it to recoveryMaster, and recoveryMaster will send it back to coordinator.

2014-05-19

Commit:	1c2f2bc
Author:	Zhihao Jia	2014-05-19 03:19:01 -0700

make highestBTreeIdMap a local variable in MasterService

2014-05-13

Commit:	1560ae3
Author:	Zhihao Jia	2014-05-12 17:35:20 -0700

Implement new protocol buffer for recovery messages

2014-04-19

Commit:	6160952
Author:	Zhihao Jia	2014-04-19 02:33:42 -0700

Add indexlet list into Tablets.proto. Current Coordinator can send indexlet information to RecoveryMaster

2014-03-25

Commit:	68263bb
Author:	Ashish Gupta	2014-03-25 08:40:37 -0700

Object Finder updated to lookup indexes. Better indexlet allocation to avoid leak of information.

2013-12-29

Commit:	bbbcc86
Author:	Stephen M. Rumble	2013-12-29 12:35:29 -0800

Merge branch 'master' of git+ssh://fiz.stanford.edu/git/ramcloud Conflicts: src/BackupStorage.cc src/ServiceManager.cc

2013-11-28

Commit:	5ac34eb
Author:	John Ousterhout	2013-11-27 17:35:42 -0800

Reworked CoordinatorServerList to store a list of pending updates in each entry and include this in the information written to external storage. This is needed to handle situations where there are multiple pending updates for single server at the time the coordinator crashes.

2013-11-24

Commit:	f11f5cb
Author:	John Ousterhout	2013-11-23 19:44:43 -0800
Committer:	John Ousterhout	2013-11-23 19:53:27 -0800

Provide default values for entries, so CoordinatorServerList.cc doesn't have to initialize explicitly (fixes RAM-570).

2013-11-21

Commit:	4acc2b6
Author:	John Ousterhout	2013-11-20 21:18:00 -0800

Fixes RAM-564 (added new method TableManager::splitRecoveringTablet).

Commit:	73008d6
Author:	John Ousterhout	2013-11-19 16:34:54 -0800
Committer:	John Ousterhout	2013-11-20 18:22:17 -0800

Major refactoring of CoordinatorServerList to use ExternalStorage for crash recovery. Also did a lot of internal cleaning up.

Commit:	af65ae8
Author:	John Ousterhout	2013-10-01 17:39:19 -0700
Committer:	John Ousterhout	2013-11-20 18:22:11 -0800

Removed all remaining LogCabin-related code from RAMCloud. Note: with this commit, RAMCloud compiles but will not pass tests or run anything interesting (among other things, enlistServer has been gutted).

Commit:	24aa119
Author:	John Ousterhout	2013-09-18 16:57:14 -0700
Committer:	John Ousterhout	2013-11-20 18:22:07 -0800

Major overhaul of TableManager: switched to use ExternalStorage for information that must survive crashes, instead of LogCabin.

Commit:	1ff0905
Author:	John Ousterhout	2013-08-21 15:42:53 -0700
Committer:	John Ousterhout	2013-11-20 18:21:54 -0800

Added CoordinatorUpdateManager class: tracks sequence numbers for recoverable coordinator operations.

2013-11-20

Commit:	e2771a1
Author:	John Ousterhout	2013-11-19 16:34:54 -0800
Committer:	John Ousterhout	2013-11-19 16:36:39 -0800

Major refactoring of CoordinatorServerList to use ExternalStorage for crash recovery. Also did a lot of internal cleaning up.

2013-10-02

Commit:	5357384
Author:	John Ousterhout	2013-10-01 17:39:19 -0700

2013-09-18

Commit:	9dcf624
Author:	John Ousterhout	2013-09-18 16:57:14 -0700
Committer:	John Ousterhout	2013-09-18 16:58:52 -0700

Major overhaul of TableManager: switched to use ExternalStorage for information that must survive crashes, instead of LogCabin.

2013-09-09

Commit:	db1cca6
Author:	Stephen M. Rumble	2013-09-08 22:27:32 -0700

Checkpoint an enormous amount of changes and bug fixes that I'll need to eventually clean up and push.

2013-08-21

Commit:	0b5adab
Author:	John Ousterhout	2013-08-21 15:42:53 -0700
Committer:	John Ousterhout	2013-08-21 15:44:00 -0700

Added CoordinatorUpdateManager class: tracks sequence numbers for recoverable coordinator operations.

2013-05-22

Commit:	61342da
Author:	Stephen M. Rumble	2013-05-21 17:46:40 -0700

Merge branch 'master' of git+ssh://fiz.stanford.edu/git/ramcloud

2013-05-21

Commit:	c79ee95
Author:	Stephen M. Rumble	2013-05-21 15:55:14 -0700

Add metric to track number of completely empty segments cleaned. This doesn't really matter for our current cleaning procedure, but it lets us more accurately calculate what it would have cost were were to read segments from backups when cleaning.

2013-05-12

Commit:	f20ca4f
Author:	Asaf Cidon	2013-05-12 14:10:40 -0700

Moved replication id functions into a new class: ServerReplicationUpdate. Made necessary changes to LogCabin and Coordinator Recovery code.

2013-04-27

Commit:	82351fd
Author:	Asaf Cidon	2013-04-26 20:33:23 -0700

Merge branch 'master' of ssh://fiz.stanford.edu/git/ramcloud

Commit:	7b2335b
Author:	Asaf Cidon	2013-04-26 17:10:54 -0700

Merge remote branch 'origin/master' Conflicts: src/CoordinatorServerList.cc src/CoordinatorServerList.h

2013-04-26

Commit:	86edf68
Author:	Asaf Cidon	2013-04-26 16:14:17 -0700

Added Coordinator support for MinCopysets

2013-04-19

Commit:	ed28565
Author:	Arjun Gopalan	2013-04-19 15:47:19 -0700

made splitTablet operation idempotent on coordinator and master. API changed such that startKeyHash and endKeyHash are no longer necessary.

2013-04-05

Commit:	d946fc5
Author:	Stephen M. Rumble	2013-04-04 23:19:43 -0700

Merge branch 'master' of git+ssh://fiz.stanford.edu/git/ramcloud Conflicts: src/MakefragTest

2013-03-25

Commit:	af2935d
Author:	Stephen M. Rumble	2013-03-25 16:41:08 -0700

More log/cleaner metrics, make Histogram::toString() trivially parsable.

2013-03-22

Commit:	16dff49
Author:	Stephen M. Rumble	2013-03-22 14:31:58 -0700
Committer:	Stephen M. Rumble	2013-03-22 15:04:56 -0700

- Add disk I/O thottling to backups. It's not super-smart, but it works under heavy I/O (basically each write operation is delayed if it occurs faster than it should have given the constraint). Backups now have a --backupWriteRateLimit <MB/s> parameter. - Fix two cost-benefit bugs. One was that we weren't substracting the same amount of space when freeing as when appending (appends included segment metaoverhead, whereas frees did not). Second, some refactoring introduced a variable that shadowed a class member, so we weren't actually updating the spaceTime field at all. All segments appeared to have age 0. I've now screwed this up three times in a year or two. It's time to add some better tests. Hopefully ones that survive refactors. - Add metrics to take note of segments that were freed before ever having been synced to disk. This is correct behaviour, but can lead to interesting results (disks appear faster than they are because the cleaner is cleaning segments that never made it to disk).

2013-03-20

Commit:	d97d566
Author:	Stephen M. Rumble	2013-03-20 13:52:40 -0700

- Add a few more metrics and abstract out the printing of cleaner metrics into a separate class so that other utilities (or clients) can make use of it. - Revert the cleaner's new balancing algorithm. It works great in some cases, but not so well in others, it turns out. Go with a simpler approach that is easier to reason about. It still doesn't quite get the desired balance in some cases (large objects, high utilisation), but adding an additional compactor thread appears to rectify that. Time will tell...

2013-03-15

Commit:	c1fa0a6
Author:	Asaf Cidon	2013-03-14 17:53:21 -0700

Revert "Added LogCabin support for MinCopysets." This reverts commit 4374dca891f4adff03f7873e3c60b562bbcfdd60.

2013-03-14

Commit:	4ae1993
Author:	Stephen M. Rumble	2013-03-14 16:59:31 -0700

Add whole-log metrics to track what types of objects are in all of the segments.

2013-03-13

Commit:	914427e
Author:	Asaf Cidon	2013-03-12 18:40:31 -0700

Merge branch 'master' of ssh://fiz.stanford.edu/git/ramcloud

Commit:	4374dca
Author:	Asaf Cidon	2013-03-12 18:39:54 -0700

Added LogCabin support for MinCopysets.

Commit:	e7ef7c5
Author:	Ankita Kejriwal	2013-03-12 17:23:16 -0700

Ensure unquie (and strictly increasing) tableIds are assigned even in face of coordinator crashes.

2013-03-12

Commit:	9315f48
Author:	Stephen M. Rumble	2013-03-12 15:48:18 -0700

- Compact segments that have a lot of tombstones and haven't been compacted recently when there are no better candidates (ones with dead object space). Tracking when tombstone space is free isn't easy, so do this instead. - Add a standalone memory compaction benchmark (CleanerCompactionBenchmark) that is very similar to RecoverSegmentBenchmark. This makes it much easier to measure and fine tune the cleaner.

2013-03-11

Commit:	cd37639
Author:	Stephen M. Rumble	2013-03-10 17:28:37 -0700

More metrics.

2013-03-09

Commit:	e623627
Author:	Stephen M. Rumble	2013-03-08 17:16:18 -0800

Add more metrics to track not only the number of each log entry type we scan during cleaning, but their cumulative sizes.

2013-03-07

Commit:	a367ea6
Author:	Stephen M. Rumble	2013-03-06 17:37:41 -0800

Add some more cleaner metrics.

2013-03-06

Commit:	bad7231
Author:	Stephen M. Rumble	2013-03-05 21:40:10 -0800

Add more cleaner metrics.

2013-03-05

Commit:	2df4ab5
Author:	Stephen M. Rumble	2013-03-04 17:12:04 -0800

Track time distribution of how many threads are simultaneously running in the cleaner.

2013-03-04

Commit:	1d16a8a
Author:	Stephen M. Rumble	2013-03-04 13:39:55 -0800

Add a --maxNonVolatileBuffers parameter to backups. This specifies the maximum number of segments that may be queued up in memory. The default is still effectively unlimited (equal to the --segmentFrames parameter), but this default has the unfortunate property that backups can exceed a machine's free memory capacity when under heavy load and get slain by the OOM killer. While here, change a few NOTICE prints to DEBUG when backups reject open requests. This is to avoid a torrent of console spam when backups are heavily loaded.

2013-02-27

Commit:	88065f9
Author:	Stephen M. Rumble	2013-02-26 23:05:20 -0800

Add a few more cleaner metrics.

2013-02-23

Commit:	8189565
Author:	Stephen M. Rumble	2013-02-23 15:55:25 -0800

- Fix another cleaner bug: we used to first wait until sufficient survivor segments were available before starting to clean. Unfortunately there was a minor bug that caused our wait method to return, erroneously promising availability of segments. The subsequent allocation would fail. This has been fixed with a simpler mechanism. We simply block in the survivor allocation method until one is available. - Add contention counters to SpinLocks, as well as (optional) names. Make every SpinLock instantiation link into a global list that we can traverse in order to populate a ProtoBuf and suck down these stats remotely. Now MasterService::getServerStatistics() returns these and LogCleanerBenchmark emits the most-contended locks.

2013-02-21

Commit:	0cbfd91
Author:	Stephen M. Rumble	2013-02-21 15:46:46 -0800

Add commandline options to specify the max number of threads to run in MasterService and the cleaner. The defaults are still 1.

2013-01-01

Commit:	7fdbf2e
Author:	Ankita Kejriwal	2012-12-31 16:19:38 -0800

Persist server list version number across coordinator failures when there are no outstanding updates to inform the new coordinator about the highest version number-ed update sent out to the cluster before coordinator crash.

2012-12-31

Commit:	bc4c8be
Author:	Ankita Kejriwal	2012-12-31 12:57:29 -0800

Coordinator now retains update version numbers across failures.

Commit:	64b29d6
Author:	Ankita Kejriwal	2012-12-30 19:57:25 -0800

In CoordinatorServerList, handling various states of a crashed server. When a server crashes, its recovery and cleanup happens as follows: 1. ServerNeedsRecovery: This server's recovery has not been completed, and needs to be (re-)started when its crash is being handled. 2. ServerCrashed: Its CRASH update is sent out to the cluster and recovery started if its entry indicates that it needs recovery (set in previous step). 3. ServerRecoveryCompleted: The server's recovery has completed. It now needs its REMOVE updates to be sent out to the cluster. Each of these steps are implemented as classes that store data and provide required functions. They persist needed information to LogCabin, and have corresponding recovery functions to recover that operation. This change is motivated by the need to persist server list update version numbers across coordinator failures. The remainder of the code to handle that will be committed soon. (Also fixed some minor latent bugs and style issues discovered along the way.)

2012-12-30

Commit:	25c85f1
Author:	Ankita Kejriwal	2012-12-30 02:42:36 -0800

Trying to make terms related to a crashed servers unambiguous. Basically, a server "crashes", and after its recovery, cluster-wide updates, and cleanup, it is "removed". More details: If a server is not responding, it is detected as "crashed" (as opposed to "down" earlier). This calls hintServerCrashed() in the Coordinator Client (as opposed to hintServerDown()), which also has correspondingly renamed WireFormat opcode. Similarly, on the Coordinator (in CSL), the serverCrashed() (as opposed to serverDown()) starts recovery, sends all required updates to the cluster, and removes server from server list. The status of a server in server list is marked as "CRASHED" when it has crashed and "REMOVE" once it has been successfully recovered (and hence removed from the cluster), and any data (example, replicas on backups) related to it can also be removed.

2012-12-20

Commit:	d177926
Author:	Stephen Yang	2012-12-20 05:46:31 -0800

Merge branch 'master' of ssh://fiz.stanford.edu/git/ramcloud Conflicts: src/CoordinatorServerList.cc src/MasterRecoveryManagerTest.cc

Commit:	33c1d7f
Author:	Stephen Yang	2012-12-20 05:10:00 -0800

== Current == Rewrite of the Coordinator's Server List Update mechanism. It is now more understandable with copious amounts of documentation and adds more FT in DCFT for the Coordinator. It is also written in a way that can be easily separated from the main Coordinator (TODO). == Issues == Fixes RAM-484 - The mechanism for updateSlots has been completely rewritten. Fixes RAM-487 - Assert is removed and isClusterUpToDate has been rewritten. The old check was intended to ensure that the updater thread never sleeps when there are updates. The new equivalent prevents sleeping when there's more work and lockups are only transient. == Future == The highest priority is to allow the Coordinator to batch updates. This will ameliorate some jira issues. The new updater has the framework to allow this but it has not been completely implemented yet. The lowest priority is to split the updater into its own class. Again, the framework has been setup, someone just has to grind through it.

2012-12-16

Commit:	f0ab50e
Author:	Ankita Kejriwal	2012-12-16 15:04:10 -0800

For coordinator operation splitTablet(), doing the following: 1. Persist state information in LogCabin. 2. Provide separate API calls for normal operation and for coordinator recovery. 3. Internally implement as classes that store required information and execute / complete the operation as needed during normal operation or recovery.

2012-12-12

Commit:	366fbbc
Author:	Ankita Kejriwal	2012-12-12 04:38:12 -0800
Committer:	Ankita Kejriwal	2012-12-12 05:03:24 -0800

For Coordinator operation tabletRecovered(), doing the following: 1. Persist state information in LogCabin. 2. Provide separate API calls for normal operation and for coordinator recovery. 3. Internally implement as classes that store required information and execute / complete the operation as needed during normal operation or recovery.

Commit:	d56c004
Author:	Ankita Kejriwal	2012-12-11 16:11:57 -0800
Committer:	Ankita Kejriwal	2012-12-11 16:18:33 -0800

For Coordinator operations to add and drop table, doing the following: 1. Persist state information in LogCabin. 2. Provide separate API calls for normal operation and for coordinator recovery. 3. Internally implement as classes that store required information and execute / complete the operation as needed during normal operation or recovery.

2012-11-08

Commit:	15e7fbc
Author:	Ankita Kejriwal	2012-11-08 14:35:55 -0800

Moving hintServerDown to CoordinatorService where it makes logical sense. CoordinatorServerList has method serverDown (renamed from forceServerDown) that is in turn called by hintServerDown after CoordinatorService verifies that the server is actually down. (Note that serverDown is also called by enlistServer when the enlisting server replaces an older crashed server.)

2012-11-03

Commit:	163f73e
Author:	Ankita Kejriwal	2012-11-03 02:33:05 -0700

Merge branch 'master' of ssh://fiz.stanford.edu/git/ramcloud Conflicts: src/CoordinatorServerManager.cc src/CoordinatorServerManagerTest.cc src/MasterService.cc

Commit:	e1ce425
Author:	Ankita Kejriwal	2012-11-02 17:19:46 -0700

Moved server mgmt code from CoordinatorServerManager to CoordinatorServerList. This fixes RAM-454. Some other changes: - Resolves segmentation fault in CoordinatorServerList caused by generateUniqueId() function by holding a lock / requiring the caller to hold a lock. This fixes RAM-478. - MasterRecoveryManager method startMasterRecovery() now takes the entry corresponding to the server to be recovered rather than just its id. This change has been made to prevent a deadlock. (Deadlock scenario: When a server having MasterService crashes, CoordinatorServerList::forceServerDown() calls MasterRecoveryManager:: startMasterRecovery(). This in turn calls back into CoordinatorServerList to retrieve the entry corresponding to the serverId of the crashed server. Since both of these CoordinatorServerList methods need a lock, it deadlocks.) Some minor changes: - Small fixes of typos / other lint where I discovered them. - CoordinatorServerList doesn't have to be compiled on clients. - StateServerDown protobuf is now ForceServerDown.

2012-10-30

Commit:	296a54b
Author:	Asaf Cidon	2012-10-30 10:56:48 -0700

Added support for the MinCopysets BackupSelector. Here's a summarized list of changes: - I virtualized the selectSecondary function in Backup Selector - Added a command line argument for useMinCopysets (the default is false) - Added the MinCopysetsBackupSelector that looks for a free backup with the same replicationId as the primary replica. Still need to add to add support for MinCopysets in handleBackupFailures and test everything together.

2012-10-02

Commit:	b237f5f
Author:	Stephen M. Rumble	2012-10-01 17:57:11 -0700

Merge branch 'master' of ssh://fiz.stanford.edu/git/ramcloud Conflicts: src/BackupSelector.cc src/BackupServiceTest.cc src/CoordinatorMain.cc src/InfRcTransport.h src/Log.cc src/LogTest.cc src/Makefrag src/MasterServiceTest.cc src/ReplicaManagerTest.cc src/ReplicatedSegment.cc src/ReplicatedSegmentTest.cc src/Segment.cc src/Segment.h src/SegmentIterator.cc src/SegmentManager.cc src/SegmentManager.h src/SegmentTest.cc src/ServerMain.cc

2012-10-01

Commit:	b493707
Author:	Stephen M. Rumble	2012-10-01 14:05:54 -0700

Keep track of the distribution of the number of segments on disk. Also, dump server statistics after prefill as well as after the complete benchmark in LogCleanerBenchmark.

2012-09-25

Commit:	5f0b0b6
Author:	Asaf Cidon	2012-09-25 14:51:58 -0700

Propagate the replicationId from the CoordinatorServerList to all the ServerLists.

2012-09-24

Commit:	8ab71e2
Author:	Ryan Stutsman	2012-09-20 10:18:29 -0700
Committer:	Ryan Stutsman	2012-09-24 16:21:03 -0700

Recovery from lost open replicas no longer involves log module. When replicas of open segments may have been lost the replica manager increments an "epoch" number associated with the segment. It then tries to contact all existing (unclosed) replicas to tag them with the new epoch number and recreates a new replica with the new epoch number. It then updates the replication epoch which consists of the open segment id and epoch number on the coordinator. During recovery the coordinator can use the segment id and epoch number to filter out any invalid replicas. This decouples the backup recovery logic from the log module because there is no need to roll over to a new log head with this new technique. Open segments remain open even after experiencing and recovering from failures.

2012-09-09

Commit:	520a71a
Author:	Stephen M. Rumble	2012-09-08 21:33:42 -0700

Merge commit '8d04ef4ac5e860b1ac79c296f1ef09cf2698c359' Conflicts: src/BackupService.cc src/Log.cc src/Log.h src/LogCleaner.cc src/LogCleaner.h src/LogIteratorTest.cc src/LogTest.cc src/MasterService.h src/ReplicaManagerTest.cc src/SegmentManager.cc src/SegmentManager.h src/SegmentManagerTest.cc src/Server.cc src/TransportManager.cc

2012-09-07

Commit:	d5118c4
Author:	Stephen Yang	2012-09-06 21:28:02 -0700

Changed the Coordinator Server List Updater to handle multiple RPCs at a time. Changed the scripts to run on infrc instead of fast+udp at the discretion of John for RAM-460. Fixes RAM-453 Fixes RAM-458 -> Removed the getServerList rpc and corresponding call in FailureDetector

2012-09-06

Commit:	52d3993
Author:	Stephen M. Rumble	2012-09-05 23:08:49 -0700

Print various metrics after the log cleaner benchmark completes.

2012-09-05

Commit:	44450b9
Author:	Stephen M. Rumble	2012-09-05 12:33:06 -0700

Add more log metrics. Break out hash table's histogram into common file, make it (de)serialisable via protobufs. LogCleanerBenchmark now uses asynchronous RPCs and can pipeline a configurable number of writes to more heavily load the server.

2012-09-04

Commit:	c0c9248
Author:	Ryan Stutsman	2012-08-31 17:14:25 -0700
Committer:	Ryan Stutsman	2012-09-04 14:24:21 -0700

Rework/re-enable benchmark code. Backups no longer perform write benchmarks. It wasn't ever used and it isn't clear where they should perform the benchmark since their storage may contain actual data. If we want it later it shouldn't be too hard to add back in in a more sensible way.

2012-08-27

Commit:	b5f32e8
Author:	Stephen M. Rumble	2012-08-26 20:00:31 -0700

Add a GET_LOG_METRICS rpc and a new protocol buffer that contains metrics from the Log, LogCleaner, LogSegment, SegmentManager, and SegletAllocator modules. This gives clients a big window into the inner workings of the log during operation. The regular metrics mechanism wasn't used because it is a little too basic in that it only supports fixed integer fields. We will likely want dynamic counters (e.g. per-segment) and very large pieces of statistics like histograms that will be vectors of integers.

2012-08-26

Commit:	49c7cd4
Author:	Stephen M. Rumble	2012-08-26 16:00:44 -0700

Add a GetServerConfig RPC that returns a protobuf with the contents of the server's ServerConfig class. This lets clients peek at the runtime configuration of servers. Add a handful of new runtime options for the cleaner. Thread the config object down into various constructors to pull config options, rather than manually extracting the needed fields in different layers. Perhaps the global server config should be a member of context...

2012-08-20

Commit:	ea6ca1c
Author:	Ankita Kejriwal	2012-08-20 14:16:41 -0700

Adding LogCabin helper method to get the entry type of a given LogCabin entry.

2012-08-17

Commit:	5fbb36d
Author:	Ankita Kejriwal	2012-08-17 01:04:15 -0700

This, along with the previous commit, fixes RAM-456. Adding a new protobuf ServerUpdate, removing extra fields from ServerInformation. Modifying server related operations according to RAM-456.

2012-08-16

Commit:	03057b6
Author:	Ankita Kejriwal	2012-08-15 17:09:57 -0700
Committer:	Ankita Kejriwal	2012-08-16 11:13:00 -0700

Merging StateEnlistServer and ServerInformation protobufs. These protobufs stored identical information (after commit c269643). For simplicity, this commit removes StateEnlistServer protobuf. The state is stored in ServerInformation while starting to enlist a server with entry_type "ServerEnlisting", and after it has successfully enlisted with entry_type "ServerEnlisted".

2012-08-13

Commit:	3d92502
Author:	Ankita Kejriwal	2012-08-13 14:19:23 -0700

For enlist server operations in which the enlisting server replaces another server, the logging (in LogCabin) for forcing out the server being replaced and for enlisting the new server is now done separately.

Commit:	0576d48
Author:	Ankita Kejriwal	2012-08-13 14:04:47 -0700

Renaming the protobuf StateHintServerDown to StateServerDown. This is because logging this state logically means that we want to force out a server, i.e., do serverDown(), rather than suspect that a server is down, i.e., hintServerDown().

2012-08-09

Commit:	5e49323
Author:	Stephen M. Rumble	2012-08-09 14:43:05 -0700

Close RAM-437 by making WallTime.cc's methods static parts of a WallTime class. Add a few unit tests while here to sanity check the clock and it's conversion to unix time.

Commit:	f8e598f
Author:	Ryan Stutsman	2012-08-09 09:47:23 -0700

Added missing file from 3b65a2.

2012-08-02

Commit:	50bac37
Author:	Ankita Kejriwal	2012-08-02 16:41:48 -0700

Reverting commit that introduced recovery path for enlistServer. This will hopefully make it simpler for Stephen to add the asynchronous updates code, and I will add this functionality back after that. This reverts commit d62fb9257c5dea228c161652de1b5053931c7052. Conflicts: src/CoordinatorServerManagerTest.cc

2012-07-31

Commit:	d62fb92
Author:	Ankita Kejriwal	2012-07-31 13:11:47 -0700
Committer:	Ankita Kejriwal	2012-07-31 13:12:43 -0700

Changing enlistServer code in CoordinatorServerManager. 1. There are separate functions in the API for normal operation and recovery. 2. Adding monitor-style locking for thread safety.

2012-07-25

Commit:	3dbe775
Author:	Ankita Kejriwal	2012-07-25 16:20:29 -0700

Modifying HintServerDown logging according to new logging semantics with ServerInformation ProtoBuf.

Commit:	57fbb8a
Author:	Ankita Kejriwal	2012-07-24 15:14:32 -0700
Committer:	Ankita Kejriwal	2012-07-25 16:17:08 -0700

SetMinOpenSegmentId now uses the new ServerInformation ProtoBuf to log state.