:rowboat: Raft implementation in Go

Related tags

Data Structures raft
Overview

🚣 Raft

This is an instructional implementation of the Raft distributed consensus algorithm in Go. It's accompanied by a series of blog posts:

Each of the partN directories in this repository is the complete source code for Part N of the blog post series (except Part 0, which is introductory and has no code). There is a lot of duplicated code between the different partN directories - this is a conscious design decision. Rather than abstracting and reusing parts of the implementation, I opted for keeping the code as simple as possible. Each directory is completely self contained and can be read and undestood in isolation. Using a graphical diff tool to see the deltas between the parts can be instructional.

How to use this repository

You can read the code, but I'd also encourage you to run tests and observe the logs they print out. The repository contains a useful tool for visualizing output. Here's a complete usage example:

$ cd part1
$ go test -v -race -run TestElectionFollowerComesBack |& tee /tmp/raftlog
... logging output
... test should PASS
$ go run ../tools/raft-testlog-viz/main.go < /tmp/raftlog
PASS TestElectionFollowerComesBack map[0:true 1:true 2:true TEST:true] ; entries: 150
... Emitted file:///tmp/TestElectionFollowerComesBack.html

PASS

Now open file:///tmp/TestElectionFollowerComesBack.html in your browser. You should see something like this:

Image of log browser

Scroll and read the logs from the servers, noticing state changes (highlighted with colors). Feel free to add your own cm.dlog(...) calls to the code to experiment and print out more details.

Changing and testing the code

Each partN directory is completely independent of the others, and is its own Go module. The Raft code itself has no external dependencies; the only require in its go.mod is for a package that enables goroutine leak testing - it's only used in tests.

To work on part2, for example:

$ cd part2
... make code changes
$ go test -race ./...

Depending on the part and your machine, the tests can take up to a minute to run. Feel free to enable verbose logging with -v, and/or used the provided dotest.sh script to run specific tests with log visualization.

Contributing

I'm interested in hearing your opinion or suggestions for the code in this repository. Feel free to open an issue if something is unclear, or if you think you found a bug. Code contributions through PRs are welcome as well.

Issues
  • Retired old leader may send a heartbeat with a new term accepted from the new leader

    Retired old leader may send a heartbeat with a new term accepted from the new leader

    Thanks for the author's articles, which is very useful for me to understand the procedure details of the raft.

    I think there may be a misbehavior that the old leader may send a heartbeat with a new term accepted from the new leader: because the leader check procedure for the heartbeat sending and the real hartbeat sending with term is not atomic.

    1. when the old leader try to send a heartbeat , the check procedure is passed ( https://github.com/eliben/raft/blob/master/part3/raft.go#L576-L581 )
    2. after passing the check , the old leader try to send the heartbeat, however, at this point, another candidate with higher term launch another round of vote ( this could be happened from a network partitioned follower reconnecting back), and this loader leader become a follower with the higher term
    3. The heartbeat sending procedure of the old leader continues, this old leader sends a heartbeat with a higher term accepted from the new leader

    I think this may be a misbehavior, the old leader should sending a heartbeat with the old term, not new(higher) term.

    if doSend {
        heartbeat_term := 0
        cm.mu.Lock()
        if cm.state != Leader {
            cm.mu.Unlock()
            return
        }
        heartbeat_term = currentTerm
        cm.mu.Unlock()
        cm.leaderSendAEs(heartbeat_term )  // Sending heartbeat with the term of the leader term.
    }
    
    opened by lday0321 3
  • TestCrashAfterSubmit

    TestCrashAfterSubmit

    I don't think this is a bug. Maybe it's a doubt.

    image

    if sleepMs(1) is commented , the test will be failure.

    Because mechanism of goroutine will execute leaderSendAEs before leader crashed.

    But environment is very complex , I don't think this is a clearness result.

    3ks for your sharing & expecting answer.

    Best regards.

    opened by dopeter 3
  • Why do need to use lock in RPC calls?

    Why do need to use lock in RPC calls?

    I am a newbie who just learned the go language, and when I read the following code, I was very confused.

    func (s *Server) Call(id int, serviceMethod string, args interface{}, reply interface{}) error {
    	s.mu.Lock()
    	peer := s.peerClients[id]
    	s.mu.Unlock()
    
    	// If this is called after shutdown (where client.Close is called), it will
    	// return an error.
    	if peer == nil {
    		return fmt.Errorf("call client %d after it's closed", id)
    	} else {
    		return peer.Call(serviceMethod, args, reply)
    	}
    }
    

    My understanding of locks is limited to when multiple goroutines read and write to a shared variable, the lock is needed to prevent one goroutine from reading the shared variable that other goroutines have not finished writing. In my opinion, for two RPC calls to the same node, the two RPCs are not related (in each RPC call, new parameters are generated for the call), so there is no need to lock.

    question 
    opened by zhazhalaila 2
  • What if this heartbeat with empty entry ?

    What if this heartbeat with empty entry ?

    https://github.com/eliben/raft/blob/d4843212bc8a8c685729a15b94363ccaffcb2b51/part2/raft.go#L483

    For example, there are 1 leader, 2 followers,

    [LEADER]
    logs:        [x, y, z]
    nextIndex[]: [3, 3, 3]
    

    next heartbeat, the variable ni = 3, entries := cm.log[3:]will out of range. Is it a bug or I miss something?

    opened by byteroll 2
  • becomeFollower may update the term from a higher one back to a low one

    becomeFollower may update the term from a higher one back to a low one

    when the node try to become a follower, it will update its currentTerm to a specified term passed to becomeFollower

    I think current implementation of becomeFollower may update currentTerm to low one, which may not be allowed.

    There are 2 entires for a node to call becomeFollower: one is from the RequestVote reply, another is from the AppendEntries reply.

    from AppendEntry reply, the condition for becomeFollower(...) is if reply.Term > savedCurrentTerm ( https://github.com/eliben/raft/blob/master/part3/raft.go#L620-L622 ). This could be a misbehavior:

    1. there are 5 nodes in the raft group, current node is the leader with term 1.
    2. another 2 nodes (node_0, node_1) are network partitioned, node_0's vote term has been increase to 4, node_1's vote term has been increased to 9.
    3. current node send AppendEntries to all other 4 nodes, and its savedCurrentTerm is 1 right now.
    4. node_1 with vote term(9) reconnected back, it sending VoteRequest with term(9), current node becom follower by thie VoteRequest since current node's term is 1, which is less than 9 and, its currentNode has been changed to 9 in becomeFollower
    5. node_0 with vote term(4)reconnected back, it received current node's AppendEntries and it reply this rpc with reply.Term = 4, when current node recevied this reply, it will reset its currentTerm from 9 back to 4

    I think it should compare the reply.Term with cm.currentTerm, not with savedCurrentTerm:

    if err := cm.server.Call(peerId, "ConsensusModule.AppendEntries", args, &reply); err == nil {
      cm.mu.Lock()
      defer cm.mu.Unlock()
      if reply.Term > cm.currentTerm {  // Not compared with savedCurrentTerm
    	  cm.dlog("term out of date in heartbeat reply")
    	  cm.becomeFollower(reply.Term)
    	  return
      }
    
    opened by lday0321 1
  • raft.go: simplify checking for majority of votes

    raft.go: simplify checking for majority of votes

    I was looking how Raft would deal with split-votes. It split-votes can't occur forever because of randomized timeouts. At some point a candidate with a shorter timeout will receive votes from a majority of the servers in the full cluster and then becomes the leader.

    But what means majority? Obviously it means more than half of the servers, which is mathematically majority >= floor(n/2) + 1 (where n is the number of servers in the full cluster).

    I thought rewriting the if clause that represents the mathematical notation would make it easier for the reader to understand how we check for the majority of the votes. Because integer divisions in Go already perform floor division, we only need to make a small change.

    All the tests are passing this change, so it should be good to merge:

    $ for x (part1 part2 part3); do echo "Testing ${x} ..." && pushd $x && go test ./... && popd; done
    Testing part1 ...
    ok      github.com/eliben/raft  8.656s
    Testing part2 ...
    ok      github.com/eliben/raft  15.168s
    Testing part3 ...
    ok      github.com/eliben/raft  26.035s
    
    opened by fatih 1
  • Part 1: alternative implementation

    Part 1: alternative implementation

    An alternative implementation of part 1, where the operations on the ConsensusModule are encapsulated within a single goroutine.

    The only files with changes are raft.go and server.go.

    opened by gterzian 1
  • Add a fix for a potential leader heartbeat race.

    Add a fix for a potential leader heartbeat race.

    The heartbeat loop invokes leaderSendAEs without holding cm.mu, so in the mean-time this node may have become a follower, and shouldn't be sending heartbeats.

    For #13

    opened by eliben 0
  • Why we do not check errors in AE responses?

    Why we do not check errors in AE responses?

    Hi, Assume that the state of the node is leader and a network partition happened, node send AE (heartbeat) to other nodes and all of the requests failed or timed out, node remain leader. Is this supposed behaviour ?

    question 
    opened by mfatemipour 1
Owner
Eli Bendersky
Eli Bendersky
Fast Raft framework using the Redis protocol for Go

This project has been archived. Please check out Uhaha for a fitter, happier, more productive Raft framework. Finn is a fast and simple framework for

Josh Baker 543 Jul 9, 2022
Go implementation of Count-Min-Log

Count-Min-Log Count-Min-Log sketch: Approximately counting with approximate counters - Guillaume Pitel & Geoffroy Fouquier TL;DR: Count-Min-Log Sketch

Seif Lotfy 57 Apr 6, 2022
A Go implementation of the Elias-Fano encoding

go-ef A Go implementation of the Elias-Fano encoding Example package main import ( "fmt" "github.com/amallia/go-ef" "os" ) func main() {

Antonio Mallia 21 Jul 19, 2022
Set is a useful collection but there is no built-in implementation in Go lang.

goset Set is a useful collection but there is no built-in implementation in Go lang. Why? The only one pkg which provides set operations now is golang

zoumo 49 Aug 2, 2022
A skip list implementation in Go

About This is a library implementing skip lists for the Go programming language (http://golang.org/). Skip lists are a data structure that can be used

Ric (Ryszard) Szopa 236 Jul 9, 2022
Go implementation of C++ STL iterators and algorithms.

iter Go implementation of C++ STL iterators and algorithms. Less hand-written loops, more expressive code. README translations: 简体中文 Motivation Althou

disksing 159 Aug 2, 2022
Go implementation to calculate Levenshtein Distance.

levenshtein Go package to calculate the Levenshtein Distance The library is fully capable of working with non-ascii strings. But the strings are not n

Agniva De Sarker 226 Jul 27, 2022
A Merkle Tree implementation written in Go.

Merkle Tree in Golang An implementation of a Merkle Tree written in Go. A Merkle Tree is a hash tree that provides an efficient way to verify the cont

Cameron Bergoon 352 Jul 31, 2022
A prefix tree implementation in go

Trie (Prefix tree) This library is compatible with Go 1.11+ Please refer to CHANGELOG.md if you encounter breaking changes. Motivation Introduction Us

Viant, Inc 29 Jul 5, 2022
Package ring provides a high performance and thread safe Go implementation of a bloom filter.

ring - high performance bloom filter Package ring provides a high performance and thread safe Go implementation of a bloom filter. Usage Please see th

Tanner Ryan 126 May 10, 2022
A simple Set data structure implementation in Go (Golang) using LinkedHashMap.

Set Set is a simple Set data structure implementation in Go (Golang) using LinkedHashMap. This library allow you to get a set of int64 or string witho

Studio Sol Comunicação Digital Ltda 21 Jul 22, 2022
A Go library for an efficient implementation of a skip list: https://godoc.org/github.com/MauriceGit/skiplist

Fast Skiplist Implementation This Go-library implements a very fast and efficient Skiplist that can be used as direct substitute for a balanced tree o

Maurice Tollmien 210 Jul 21, 2022
A slice-based implementation of a stack. In Go!

Stackgo Stackgo is a slice-based implementation of a simple stack in Go. It uses a pre-alloc pagination strategy which adds little memory overhead to

Alessandro Diaferia 15 Dec 13, 2021
A Left-Leaning Red-Black (LLRB) implementation of balanced binary search trees for Google Go

GoLLRB GoLLRB is a Left-Leaning Red-Black (LLRB) implementation of 2-3 balanced binary search trees in Go Language. Overview As of this writing and to

Petar Maymounkov 727 Jul 24, 2022
Golang implementation of Radix trees

go-radix Provides the radix package that implements a radix tree. The package only provides a single Tree implementation, optimized for sparse nodes.

Armon Dadgar 740 Aug 5, 2022
gtreap is an immutable treap implementation in the Go Language

gtreap gtreap is an immutable treap implementation in the Go Language Overview gtreap implements an immutable treap data structure in golang. By treap

Steve Yen 84 May 17, 2022
An yet-another red-black tree implementation, with a C++ STL-like API.

A red-black tree with an API similar to C++ STL's. INSTALLATION go get github.com/yasushi-saito/rbtree EXAMPLE More examples can be fou

Yasushi Saito 18 Apr 25, 2022
A simple Bloom Filter implementation in Go

This is a simple Bloom filter implementation written in Go. For the theory behind Bloom filters, read http://en.wikipedia.org/wiki/Bloom_filter This

Damian Gryski 16 Apr 26, 2018
HyperLogLog and HyperLogLog++ implementation in Go/Golang.

HyperLogLog and HyperLogLog++ Implements the HyperLogLog and HyperLogLog++ algorithms. HyperLogLog paper: http://algo.inria.fr/flajolet/Publications/F

Clark DuVall 408 Jun 24, 2022