An implementation of a distributed KV store backed by Raft tolerant of node failures and network partitions 🚣

Overview

barge

A simple implementation of a consistent, distributed Key:Value store which uses the Raft Concensus Algorithm.

This project launches a cluster of raft nodes that run as concurrent state machines. A client interface is also provided for finding the leader and sending data to the cluster. Each node is equipped with gRPC clients/servers for communication with the client interface and other nodes.

The cluster is tolerant to node failiures and network partitions (a cluster of N can support ⌈N/2βŒ‰ - 1 failures). The cluster is smart enough to elect a new leader if needed, and the client is smart enough to find the leader.

Interesting Events

This project offers a debug flag which creates a log.html file that contains a table of cluster events. The log used to generate the report below is included as log-heartbeats-purged.html. It contains the data for all the interesting events below, but 8000+ lines of heartbeat events have been manually removed to keep the file short.

1. Initial Leader Election

Initial Leader Election All the nodes start as followers without any data. Node 1's randomized election timer runs out first, so it increments its term starts an election. Nodes 0 and 2 grant their votes to node 1, and node 1 votes for itself. Now, all nodes are in term 1 with node 1 as the leader.

2. Heartbeats

Header Heartbeats As the leader, node 1 has the duty to to indicate that there is a leader in the cluster. It sends empty, idempotent requests to append data to each node in the cluster to indicate that it is functional. No more elections start because each follower node resets its election timer upon receiving these heartbeats.

3. Updating Data in the Cluster

Header Updating Data in the Cluster 0 Node 1, the leader, receives data {key: 100, value: "Some Value"} from the client. It tells nodes 1 and 2 to append that change to their logs. Because both succeed, node 0 knows the entry is successfully duplicated on the majority of servers, so it commits the log entry.

Updating Data in the Cluster 1

In the following heartbeat, the node 1 tells its followers that it has committed upto the last entry, so the nodes 0 and 2 commit their log entries locally.

4. Follower Node Failure

Header Follower Node Failure 0 Node 0 dies and does not recieve heartbeats from node 1.

Follower Node Failure 1 Node 1 gets a request to add a value {key: 200}. Although it cannot send it to node 0, it successfully sends the entry to node 2. It commits it knowing that is duplicated on the majority of nodes.

Follower Node Failure 2 Node 0 comes back to life and starts receiving heartbeats again.

Follower Node Failure 3 Node 1 gets a request to add another value {key: 300}. When sending it to node 0, it sees that node 0 missed the previous value {key: 200} while it was dead. Node 1 catches up node 0 by giving it log entries to append. On the following heartbeat, all the nodes see that the leader has committed up to commit up to {key: 300}, so they commit any missed log entries up to and including the entry with {key: 300}.

5. Leader Node Failure and Re-Election

Header Leader Node Failure and Re-Election 0 Here, the leader node dies. Node 2 starts an election for itself. Node 2 fails to get elected as leader. Most likely, this is because node 0 started an election at roughly the same time and voted for itself. Node starts and election and succeeds, getting a vote from node 2 and itself.

Leader Node Failure and Re-Election 1 As the new leader, node 0 gets a new entry, modifying {key:200}. As the leader, it duplicates the entry onto node 2 and commits it. Node 2 commits it after the following heartbeat.

Leader Node Failure and Re-Election 2 Node 1 comes back online and starts getting heartbeats again.

Leader Node Failure and Re-Election 3 Node 0 gets the request to add an entry to modify {key:300}. When sending it to node 1, it sees that node 1 is out of date, so it catches up node 1. Node 1 appends the entry it missed {key:200} and the new entry {key:300} to its log. Node 2 appends the new entry to its log. Node 0 commits its newest entry, knowing that it was duplicated on the majority of servers (all of them in this case). During the next heartbeat, nodes 1 and 2 see that the leader has committed up to its newest entry {key:300}, so they commit all uncommitted entries up to that.

Resources

Usage

The configuration.yml file defines the networking configs needed to run the simulation.

To launch the cluster with html log: go build && ./barge -debug

To launch the client with: go build && ./barge -client

Client Commands:

  1. {key}={value} Set int32 key to string value
  2. kill {nodeId} Kill a node with a given id (must be a node id from the config file)
  3. revive {nodeId} Restore a dead node to the follower state

Development

Download Deps go mod download

Generate Protocol Buffer for node <-> node RPCs protoc -I node/ node/raft.proto --go_out=plugins=grpc:node

Generate Protocol Buffer for client <-> cluster RPCs protoc -I api/ api/api.proto --go_out=plugins=grpc:api

Formatting
go fmt ./...

Owner
Shehjad Khan
Software & ML Engineer. Mathematics @uWaterloo.
Shehjad Khan
Raft library Raft is a protocol with which a cluster of nodes can maintain a replicated state machine.

Raft library Raft is a protocol with which a cluster of nodes can maintain a replicated state machine. The state machine is kept in sync through the u

Kalyan Akella 0 Oct 15, 2021
Dkron - Distributed, fault tolerant job scheduling system https://dkron.io

Dkron - Distributed, fault tolerant job scheduling system for cloud native environments Website: http://dkron.io/ Dkron is a distributed cron service,

Distributed Works 2.9k Dec 6, 2021
A distributed fault-tolerant order book matching engine

Go - Between A distributed fault-tolerant order book matching engine. Features Limit orders Market orders Order book depth Calculate market price for

Daniel Gatis 0 Nov 28, 2021
A linearizability distributed database by raft and wisckey.

AlfheimDB A linearizability distributed database by raft and wisckey, which supports redis client. Build This project build by mage, you will need ins

chunming.dong 37 Nov 27, 2021
Distributed disk storage database based on Raft and Redis protocol.

IceFireDB Distributed disk storage system based on Raft and RESP protocol. High performance Distributed consistency Reliable LSM disk storage Cold and

IceFireDB 704 Nov 25, 2021
Distributed reliable key-value store for the most critical data of a distributed system

etcd Note: The main branch may be in an unstable or even broken state during development. For stable versions, see releases. etcd is a distributed rel

etcd-io 38.1k Dec 6, 2021
A distributed MySQL binlog storage system built on Raft

What is kingbus? δΈ­ζ–‡ Kingbus is a distributed MySQL binlog store based on raft. Kingbus can act as a slave to the real master and as a master to the sl

Fei Chen 823 Nov 19, 2021
Lightweight, fault-tolerant message streams.

Liftbridge provides lightweight, fault-tolerant message streams by implementing a durable stream augmentation for the NATS messaging system. It extend

Liftbridge 2.2k Nov 30, 2021
Scalable, fault-tolerant application-layer sharding for Go applications

ringpop-go (This project is no longer under active development.) Ringpop is a library that brings cooperation and coordination to distributed applicat

Uber Open Source 711 Dec 1, 2021
Golang implementation of the Raft consensus protocol

raft raft is a Go library that manages a replicated log and can be used with an FSM to manage replicated state machines. It is a library for providing

HashiCorp 5.3k Nov 27, 2021
A naive implementation of Raft consensus algorithm.

This implementation is used to learn/understand the Raft consensus algorithm. The code implements the behaviors shown in Figure 2 of the Raft paper wi

Martin 0 Nov 28, 2021
MySQL Backed Locking Primitive

go-mysql-lock go-mysql-lock provides locking primitive based on MySQL's GET_LOCK Lock names are strings and MySQL enforces a maximum length on lock na

Sanket Patel 34 Nov 30, 2021
A feature complete and high performance multi-group Raft library in Go.

Dragonboat - A Multi-Group Raft library in Go / δΈ­ζ–‡η‰ˆ News 2021-01-20 Dragonboat v3.3 has been released, please check CHANGELOG for all changes. 2020-03

lni 4k Dec 6, 2021
The TinyKV course builds a key-value storage system with the Raft consensus algorithm.

The TinyKV Course The TinyKV course builds a key-value storage system with the Raft consensus algorithm. It is inspired by MIT 6.824 and TiKV Project.

jaegerwang 1 Nov 19, 2021
Distributed lock manager. Warning: very hard to use it properly. Not because it's broken, but because distributed systems are hard. If in doubt, do not use this.

What Dlock is a distributed lock manager [1]. It is designed after flock utility but for multiple machines. When client disconnects, all his locks are

Sergey Shepelev 25 Dec 24, 2019
A Golang implementation of the Umee network, a decentralized universal capital facility in the Cosmos ecosystem.

Umee A Golang implementation of the Umee network, a decentralized universal capital facility in the Cosmos ecosystem. Umee is a Universal Capital Faci

null 103 Dec 3, 2021
Simplified distributed locking implementation using Redis

redislock Simplified distributed locking implementation using Redis. For more information, please see examples. Examples import ( "fmt" "time"

Black Square Media 547 Dec 3, 2021
An implementation of a distributed access-control server that is based on Google Zanzibar

An implementation of a distributed access-control server that is based on Google Zanzibar - "Google's Consistent, Global Authorization System".

authorizer.tech 55 Nov 2, 2021