In this assignment you will build a fault-tolerant key-value storage service using your Raft library from the previous assignments. Your key-value service will be structured as a replicated state machine with several key-value servers that coordinate their activities through the Raft log. Your key/value service should continue to process client requests as long as a majority of the servers are alive and can communicate, in spite of other failures or network partitions.
Your system will consist of clients and key/value servers, where each key/value server also acts as a Raft peer. Clients send Put(), Append(), and Get() RPCs to key/value servers (called kvraft servers), who then place those calls into the Raft log and execute them in order. A client can send an RPC to any of the kvraft servers, but if that server is not currently a Raft leader, or if there's a failure, the client should retry by sending to a different server. If the operation is committed to the Raft log (and hence applied to the key/value state machine), its result is reported to the client. If the operation failed to commit (for example, if the leader was replaced), the server reports an error, and the client retries with a different server.
We have supplied you with skeleton code and tests under this directory. You will need to modify kvraft/client.go, kvraft/server.go, and perhaps kvraft/common.go. (Even if you don't modify common.go, you should submit it as-provided.) For this assignment you should use your own implementation of Raft from HW4.
To get up and running, copy raft.go file from assignment4 to the assignment5\src\raft folder, then execute the following commands, as in the previous assignments, and change into the src/kvraft directory:
# Go needs $GOPATH to be set to the directory containing "src" $ cd 418/assignment5 $ export GOPATH="$PWD" $ cd "$GOPATH/src/kvraft" $ go test
The service supports three RPCs: Put(key, value), Append(key, arg), and Get(key). It maintains a simple database of key/value pairs. Put() replaces the value for a particular key in the database, Append(key, arg) appends arg to key's value, and Get() fetches the current value for a key. An Append to a non-existant key should act like Put.
You will implement the service as a replicated state machine consisting of several kvservers. Your kvraft client code (Clerk in src/kvraft/client.go) should try different kvservers it knows about until one responds positively. As long as a client can contact a kvraft server that is a Raft leader in a majority partition, its operations should eventually succeed.
Your kvraft servers should not directly communicate; they should only interact with each other through the Raft log.
Your first task is to implement a solution that works when there are no dropped messages, and no failed servers. Note that your service must provide sequential consistency to applications that use its client interface. That is, completed application calls to the Clerk.Get(), Clerk.Put(), and Clerk.Append() methods in kvraft/client.go must appear to have affected all kvservers in the same order, and have at-most-once semantics. A Clerk.Get(key) should see the value written by the most recent Clerk.Put(key, …) or Clerk.Append(key, …) (in the total order).
A reasonable plan of attack may be to first fill in the Op struct in server.go with the "value" information that kvraft will use Raft to agree on (remember that Op field names must start with capital letters, since they will be sent through RPC), and then implement the PutAppend() and Get() handlers in server.go. The handlers should enter an Op in the Raft log using Start(), and should reply to the client when that log entry is committed. Note that you cannot execute an operation until the point at which it is committed in the log (i.e., when it arrives on the Raft applyCh).
After calling Start(), your kvraft servers will need to wait for Raft to complete agreement. Commands that have been agreed upon arrive on the applyCh. You should think carefully about how to arrange your code so that your code will keep reading applyCh, while PutAppend() and Get() handlers submit commands to the Raft log using Start(). It is easy to achieve deadlock between the kvserver and its Raft library.
Your solution needs to handle the case in which a leader has called Start() for a client RPC, but loses its leadership before the request is committed to the log. In this case you should arrange for the client to re-send the request to other servers until it finds the new leader. One way to do this is for the server to detect that it has lost leadership, by noticing that a different request has appeared at the index returned by Start(), or that the term reported by Raft.GetState() has changed. If the ex-leader is partitioned by itself, it won't know about new leaders; but any client in the same partition won't be able to talk to a new leader either, so it's OK in this case for the server and client to wait indefinitely until the partition heals. More generally, a kvraft server should not complete a Get() RPC if it is not part of a majority.
You have completed Part I when you reliably pass the first test in the test suite: "One client". You may also find that you can pass the "concurrent clients" test, depending on how sophisticated your implementation is. From the src/kvraft directory:
$ go test -v -run TestBasic === RUN TestBasic Test: One client ... ... Passed --- PASS: TestBasic (15.18s) PASS ok kvraft 15.190s
In the face of unreliable connections and node failures, your clients may send RPCs multiple times until it finds a kvraft server that replies positively. One consequence of this is that you must ensure that each application call to Clerk.Put() or Clerk.Append() must appear in that order just once (i.e., write the key/value database just once).
Thus, your task in Part II is to cope with duplicate client requests, including situations where the client sends a request to a kvraft leader in one term, times out waiting for a reply, and re-sends the request to a new leader in another term. The client request should always execute just once.
You will need to uniquely identify client operations to ensure that they execute just once. You can assume that each clerk has only one outstanding Put, Get, or Append.
For stability, you must make sure that your scheme for duplicate detection frees server memory quickly, for example by having the client tell the servers which RPCs it has heard a reply for. It's OK to piggyback this information on the next client request.
You have completed Part II when you reliably pass all tests through TestPersistPartitionUnreliable().
$ go test -v === RUN TestBasic Test: One client ... ... Passed --- PASS: TestBasic (15.22s) === RUN TestConcurrent Test: concurrent clients ... ... Passed --- PASS: TestConcurrent (15.83s) === RUN TestUnreliable Test: unreliable ... ... Passed --- PASS: TestUnreliable (16.68s) === RUN TestUnreliableOneKey Test: Concurrent Append to same key, unreliable ... ... Passed --- PASS: TestUnreliableOneKey (1.40s) === RUN TestOnePartition Test: Progress in majority ... ... Passed Test: No progress in minority ... ... Passed Test: Completion after heal ... ... Passed --- PASS: TestOnePartition (2.54s) === RUN TestManyPartitionsOneClient Test: many partitions ... ... Passed --- PASS: TestManyPartitionsOneClient (24.08s) === RUN TestManyPartitionsManyClients Test: many partitions, many clients ... ... Passed --- PASS: TestManyPartitionsManyClients (26.12s) === RUN TestPersistOneClient Test: persistence with one client ... ... Passed --- PASS: TestPersistOneClient (18.68s) === RUN TestPersistConcurrent Test: persistence with concurrent clients ... ... Passed --- PASS: TestPersistConcurrent (19.34s) === RUN TestPersistConcurrentUnreliable Test: persistence with concurrent clients, unreliable ... ... Passed --- PASS: TestPersistConcurrentUnreliable (20.37s) === RUN TestPersistPartition Test: persistence with concurrent clients and repartitioning servers... ... Passed --- PASS: TestPersistPartition (26.91s) === RUN TestPersistPartitionUnreliable Test: persistence with concurrent clients and repartitioning servers, unreliable... ... Passed --- PASS: TestPersistPartitionUnreliable (26.89s) PASS ok kvraft 214.069s
This assignment doesn't require you to write much code, but you will most likely spend a substantial amount of time thinking and staring at debugging logs to figure out why your implementation doesn't work. Debugging will be more challenging than in the Raft lab because there are more components that work asynchronously of each other. Start early!
You should implement the service without worrying about the Raft log's growing without bound. You do not need to implement snapshots (from Section 7 in the paper) to allow garbage collection of old log entries.
As noted, a kvraft server should not complete a Get() RPC if it is not part of a majority (so that it does not serve stale data). A simple solution is to enter every Get() (as well as each Put() and Append()) in the Raft log. You don't have to implement the optimization for read-only operations that is described in Section 8.
In Part I, you should probably modify your client Clerk to remember which server turned out to be the leader for the last RPC, and send the next RPC to that server first. This will avoid wasting time searching for the leader on every RPC.
Test | Points |
---|---|
TestBasic$ | 8 |
TestConcurrent$ | 8 |
TestUnreliable$ | 8 |
TestUnreliableOneKey$ | 8 |
TestOnePartition$ | 8 |
TestManyPartitionsOneClient$ | 8 |
TestManyPartitionsManyClients$ | 8 |
TestPersistOneClient$ | 8 |
TestPersistConcurrent$ | 9 |
TestPersistConcurrentUnreliable$ | 9 |
TestPersistPartition$ | 9 |
TestPersistPartitionUnreliable$ | 9 |
You hand in your assignment as before.
$ git commit -am "[you fill me in]"
$ git tag -a -m "i finished assignment 5" a5-handin
$ git push origin master a5-handin
Recall, in order to overwrite a tag use the force flag as follows.
$ git tag -fam "i finished assignment 5" a5-handin
$ git push -f --tags
You will receive full credit for Part I if your software passes the tests mentioned for that section on the CS servers. You will receive full credit for Part II if your software passes the tests mentioned for that section on the CS servers.
This assignment is adapted from MIT's 6.824 course. Thanks to Frans Kaashoek, Robert Morris, and Nickolai Zeldovich for their support.